Monday, June 9, 2014

The Completing the 2014 world cup Panini album

This post is a little different from the previous ones, but still is something that has interested me for a long time.

In Colombia there is this great tradition of completing the official album of the FIFA world cup every 4 years. The album includes on sticker per player for each participant country as well as some complementary stickers for the stadiums an some very shiny ones for the cup and the official logo. A total of 642 stickers that can be bought in envelopes of 7 at $1 the envelope. With a cost of $2 for the album filling the album by buying the stickers at cost will be around $93. But of course that will never happen since each envelope has a random sample of the total ensemble of stickers.

My question is then, if there is no exchange of stickers, how much would it cost in average to complete the album?

Lets assume the stickers are evenly manufactured (which has been claimed by the manufactured but who knows). Also let's assume we could buy sequentially as many packs as we want.
If there is no exchange of stickers the task gets very difficult with a mean value of 3900 stickers to fill the album, about $550.





Thursday, May 22, 2014

Can we determine the gender of a given runner from his or hers split times?

From my previous post, we found out that there is a clear difference in the fractional splits pace for male and female.

The question I want to address now is whether the splits provide a good way to predict the gender of a runner from the split times.

To approach the question, a preliminary data analysis as the one presented before suggest that data from splits in the 5, 10, 15, 20, 25, 30, 35, 40K checkpoints and the total time provide enough information to attempt a regression problem where the target variable is binary (1 for male runner and 0 for female)

I started with the entire set of runners in NYC 2011 marathon considering a random sampling for the training set and the test set with a few algorithms. Here I will present the results with k-NN

k-NN:

Two heads are better than one and k heads are even better than one.
The k-NN relays in the fact that locally, the best decision can be made when the majority of your friends agrees on something. Many assumptions are made to get to this claim but it seems very reasonable. One important thing to consider here is the meaning of close friends. How do you determine who are your closest friends. It is clear that this question lies beyond the geographical sense of the world, and it requires a different way to measure distances, that is a metric.

In the case in hands, the input data will be the split times and the labels (1 for male and 0 for female).
Many metrics where implemented, among them euclidean, Manhattan, Dot product





Wednesday, May 21, 2014

Keeping a constant pace

Some people might consider running "boring". But there is some strategy involved. And the pace is the key parameter to consider.

In an event like a marathon, keeping a constant pace is definitely a challenge and those who manage to maintain the same pace throughout the whole race seem to perform better.


In most cases the first half of the race shows a faster performance. This motivates a measurement of the asymmetry that depends on the difference between the first half and the second over the total time. For instance, if the pace was constant the asymmetry factor will be zero. Two extreme cases can be considered: if the entire race was performed in the first half of the time and the other way around.

With that definition, most of the athletes will have positive values of the asymmetry factor while the elite runners will swarm around zero.


In the figure above, the values of asymmetry factor are shown for male and female athletes in NYC Marathon 2011. 

There is also a plateau in the AsymFactor near 5h which suggests that after that time the all the athletes will have a bad second half regardless of the time.

The evolution of the pace over a race can be tracked using the checking points at 5, 10, 15, 20, 25, 30, 35 and 40 km. The fraction of the pace at any given check point over the average pace, - denoted Fractional Pace - can provide a lot of information about the performance. 

Below the distribution of the fractional pace for male and female athletes for several checkpoints

It is interesting to note the two-peak structure in the distribution for 5km. The narrow peak, corresponding to paces closer to the mean value are typical of elite athletes.


The distribution for 35km shows a displacement towards larger fractional paces. 


The peaks of the distribution for men and women are shifted. In the case of women, at 35K the pace is somewhat faster than the mean value during the race.

As a conclusion, as the race evolves the capability to maintain constant paces seems to be determinant in the performance of the runner. The best strategy will be then choose a pace that allows the runner to run evenly throughout the race.

Sunday, April 6, 2014

Running NYC Marathon before it was cool

Iconic Marathons as Boston, London, Berlin and New York are always high in the top picks of any runner. I have been blessed with the opportunity of running NY marathon in 2011 and -almost 2012 when hurricane Sandy prevented the event from happening.

NYC marathon has been around since 1970 with 127 runners -only one female runner among them- started the race. Four decades later the number of participant was almost 50 thousand and the ratio male to female was about 4 females for 7 males.  According to New York RoadRunners, in 1970 Fred Lebow and Vinve Chiappetta organized the first New York City Marathon with only 55 finishers. The course consisted on loops to Central Park cheered by a handful of running enthusiasts.

But with time, the Marathon grew in numbers both in participants and spectators.  In the figure below the number of participants at the finish line is showed. In red the number of female runners and blue the number of male runners.  The sink in the number of participants in 2001 is the direct consequence of the tragic events of 9/11. What is interesting is the apparent tendency to lower attendance starting 1999. After 2001 the participation has increased continuously being more accelerated in female runners, with the support of ING.



What powered such big improvement? What makes NYC marathon one of the most popular races in the world?

For one thing it is NYC! The most amazing city of the world can only have the most amazing marathon in the world. Starting in Staten Island with a plain sight of the Statue of Liberty while blasting Sinatra's "New York, New York" makes every minute of training and every mile of the trip worth it. Also New York Road Runners always outdoes itself in the logistics, coverage and publicity of the race so the marathon has become one everlasting experience.

In New York you can find a race every other weekend from 5K to ultramarathons. The culture of running is so close to the city's heart that a marathon is almost a natural goal to aim for once the running bug has is in you. When you are in the Big Apple, make sure you stop by Central Park or Prospect Park and you will see hundreds of runners from all ages, backgrounds and fitness level. This City is a Running City so the Marathon is not an isolated event but part of the live and the spirit of the City.

Coverage from the media helps to position the marathon in our collective memory. ABC and ESPN provide live coverage of the events with commentators and invited host to entertain the more than 580 thousand viewers from 2013 NY marathon, according to a press release from NYRR.

But as usual, starting an event like the marathon was not an easy task. The numbers increased at a low pace until 1976 when the course was changed and started covering the whole city.


A question that we can ask from the data is, when will the ratio male female reach 1? Assuming the tendency from the last couple of years will remain constant, that wont happen in the next 20 years at least. However the rapid grow of the number of participants will impose more critical challenges to the logistics of the event.

And Internet has also witnessed the interest of the world in the NYC marathon. Using Google's tool Correlate, comparative trends between the searches: "nyc marathon" and "ing new york " are shown. It is clear the periodic appearance of a high peak toward the end of the year, close to the date of the event and also a much more smaller peak in april after Boston Marathon.



"If I can make it there, I can make it anywhere" sang Frank Sinatra in a love letter to the Big Apple. And I would also say "If I can finish it there, I can finish it anywhere". New York will continue growing and getting more and more awesome. This race is to me the best first-marathon experience any runner could aspire.

Stick around for more stories from the numbers.









Saturday, April 5, 2014

Put down the scissors!

Everyone has heard, at least once, about the dangers of running with scissors. Horrified parents screaming at kids that, unaware of possible accidents, sprint in the house holding what could be a lethal weapon.

Every runner no matter the pace, experience or goals wants to do BETTER: run faster than last week or break that personal record that has been always a source of pride or shred a few pounds.

The first runs are always tough because our body was not used to such stress. And every step will remind us of how angry our legs are with us for pushing our limit. But it just a matter of time that we start looking forward to the next run. As a matter of fact right after my first half-marathon, while desperately trying to rehydrate, I asked myself: "Why do I do this?! I will never EVER put myself through this AGAIN". A few hours later I found myself registering to my second race.

Just like everything else, improving your running is also a process. It requires patience, endurance and planning. Personally I like to plan my training based on the KISS principle: Keep it Simple Stupid. Which can also be  also put as Train Smart.

The amount information available on running and athletic performance is huge and a closer look at that fascinating topic generates millions of questions on the performance of the individual and the collective results in a race.

In this blog I will share some of my thoughts about running and training based on some facts gathered from races, personal experience and studies on the various matters of running.

And after all, running with facts wont make any one yield at you!