If you use Twitter or Facebook, you may occasionally see status updates that look like: “I crushed a 7.2 mi run with a pace of 7’24” with Nike+ Running App.
#nikeplus.” About a month ago I started collecting those tweets into a database, to see what interesting stats I could extract. Here are the results.
I collected about 300k tweets, of which about 85k had both the distance and the pace of the run. Some were in miles and some in kilometers, so I normalized everything to the metric system. I cleaned up some obvious errors or misuses of the system, e.g. people who “ran” at inhuman speeds such as 50 mph. Cheetahs :). I used the remaining data to plot a few charts. The first one is a histogram of distances people ran, split into weekdays and weekends:
A few insights from this chart:
- As expected, longer distances quickly become infrequent.
- People run longer on weekends. In fact, the average distance for a weekend run is 8.75 km compared with 6.47 km on weekdays.
- Certain distances seem to be particularly popular during weekends, and they look like typical race lengths: 10k, 16k (10 miles), the half marathon and the marathon.
Here is a chart highlighting the difference between weekends and weekdays. For the comparison I normalized both datasets as if I had 100k runs of each, even though obviously I collected more weekday data than weekend.
How about speeds? I tried to see if people run faster on weekends, it turns out that on average they don’t. I did expect the run length to make a difference in speeds, and it does. Here are the speeds for 5k and 10k runs, 5k runs are obviously a little faster on average.
The average speed for people who track their runs with #nikeplus seems to be around 10 or 11 km/h, which is a bit slower than I had guessed. Of course, I have no idea if this self-selected group of people who share their runs on the internet is representative of all runners. However, if you’ve ever run a popular race this curve will seem very familiar to you: the middle of the pack is always crowded, the elite athletes are few and spread out. Same for the slow runner / walkers who finish as the roads are being reopened to traffic.Can you think of more insights you would extract from the data? My code is on Github if you’d like to play with it.