# Finding the optimal marathon strategy in data

**Posted:**May 31, 2011

**Filed under:**Own projects |

**Tags:**ruby, screen scraping, spss, stockholm marathon, visualization 3 Comments

After half a year of training I ran my third marathon this weekend in Stockholm. It was a great race and managed to take three minutes of my own PB and finish on 3.10.

However, the race got me thinking: what is the optimal strategy for a marathon race? That is, how fast should I run in the beginning? I turned to data to look for the answer.

### The Data

Stockholm Marathon gathers up to 20 000 runners every year. For each runner split times are recorded every 5 km. So there is plenty of data to work with here.

To get the data I wrote a Ruby script that scans the results of 2010 and records the results of all the (male) runners that participated in both 2010 and 2009. The scrape gave me a dataset of 3965 runners. For each runner I stored the variables age, finish time and time at 10 km.

To make the to races comparable I multiplied the times of 2009 by 0,967, as it turned out that the average time of 2010 was only 4.04.44, compared to 4.12.58 in 2009. Presumably the conditions were better in 2010.

### The Analysis

We assume that all runners are equally good in 2009 and 2010. This means that a good run can be defined as run where the time is better than other year. If runner A does 4.00 in 2009 and 3.45 in 2010 he has made a better performance in 2010. But the question is: did the starting pace affect the result?

Lets start by splitting the runners in to three groups. Firstly the ones that improved their times by at least 10 percent in 2010 (green color), secondly the ones that did close to the same result in 2009 and 2010 (blue) and thirdly the ones that did at least 10 percent worse in 2010 (red).

Fear not if you don’t get this picture immediately. I’ll try to explain:

- The y-axis is the relative pace at 10 km. Values above 0 means that the starting pace is slower than the average speed for the whole race. Values below 0 means that the starting pace has been higher than the average pace for the whole race.
- The x-axis is the finish time for 2010 divided by the finish time for 2009. If it is less than 1 it means that the time in 2010 was better than the time in 2009.

What does the graph tell us? As you can see the green runners (the once that improved their times in 2010 compared to 2009) start the race in almost the same pace as they will keep for the rest of the race. In other words: the starting pace is almost the same as the average pace for the whole 42 km.

The red runners (the ones that did worse in 2010 than 2009), on the other hand, don’t manage to keep up the speed. Their first 10 km are considerably faster than the average speed for the rest of the race.

How big is the difference? An average green runner (that is heading for a time of 4.00) does the first 10 km 20 seconds faster than his average speed for the whole race. The blue runner 36 seconds faster. The red runner 1.24 minutes faster.

### The Conclusion

What is the optimal marathon strategy then? It should come as no surprise to anyone that has ever run 42 km that it is only after 30 km that the race starts. The difference between the runners that managed to improve their times between 2009 and 2010 and the ones that didn’t, is that the latter group was not able to keep a steady pace throughout the race.

Myself I started in an optimistic pace of 4.15-4-25 min/km and ended up around 4.45-4.55 min/km. Not a perfect way to set up a race according to this analysis, but good enough to perform a PB.

### A Sidenote

So, we now know that a steady pace is what you should aim for when you run a marathon. Starting too fast is a risky strategy. As we could see in the previous graph many of the fastest starters in 2010 ended up performing poorly. But what about the fast starters in 2009? Did they learn from their mistakes?

Actually, yes. In this next graph I have singled out the 10 percent of the runners started fastest in 2009 and compared their results in 2009 and 2010.

The setup is the same as in the previous graph. Y-axis is the relative starting pace. X-axis the finish time compared to the other year (for 2010: 2010/2009, for 2009: 2009/2010). The blue dots represent 2009, the green 2010.

Two conclusions can be made:

- The majority of the runners started slower in 2010 (or they were able to keep the pace better).
- Most of the runners improved their times (apparently they had learnt something).

…

Want to explore the data yourself?