In cycling, teams constantly try to outdo each other. Not only by using the most advanced equipment, but also by improving nutrition, training methods and in all kinds of other areas. That’s why Innovation Origins is looking for innovations inside the peloton in the run-up to the Tour de France. More stories on the Tour de France can be found here.
For many people, predicting the winner in a stage of the Tour is a way to make the long sit in front of the TV bearable. Who grabs the sprinter’s jersey? Do the climbers have a bit of power in their legs? And – perhaps most important of all – is your team’s front-runner in Paris wearing that prestigious yellow ? These questions reappear every year. The day before the tour you scour the internet looking for riders who might surprise you. And after a lot of deliberation, you yet again include the top favorites in your virtual cycling team.
But what f you were able to let a computer do all that work for you? Arjan Zoer developed a predictive algorithm which he uses to forecast race results. I fill in my betting pool choices together with him – in the hope that these will end up on top this year.
Who is the best?
It started with the Cycling Manager game for Zoer, a simulation game that has a huge database with all the riders and their statistics. “The nice thing about this was that you could change the database. This led to discussions with fanatical players all over the world,” says Zoer. “The Spaniards thought Contador was the best climber, the Italians thought Nibali was and I, as a Dutchman, thought that Robert Gesink had a lot of potential. Then I thought: this all must be able to be calculated, right? I started to get results from different sites and by using a few mathematical tricks, I converted these into features of individual riders which you could use in the game.”
It doesn’t end there for Zoer either: “In any case, if I have these features and statistics of riders, I might as well look at real races in order to predict who is the most likely to get good results.” Where Zoer first only entered statistics, he has now updated the database with team tactics, the helpers, and has adjusted the calculations on time trials. “I’ve been working on this for five or six years, at least 30 hours a week every holiday. And at least one hour a day on weekdays. It’s a hobby that has gotten out of hand, but it energizes me. It gives me satisfaction to be on track as close as possible with my calculations. Luckily, I have a very lovely wife who gives me that space,” Zoer laughs on the phone.
Accounting for the field of competitors
When Zoer was just getting started, a fairly unknown name occasionally rolled out of the computer. Zoer solved this by not only entering the race results, but also taking into account the race profile and the number of participating cyclists at the start. “Two years ago there were sprinters in China who collected as many points as Marcel Kittel and Mark Cavendisch. According to the model, then. However, Kittel and Cavendisch were unbeatable – certainly in that period. I make sure that the algorithm takes that into account now.”
In the first two weeks of the recent Giro d’Italia race, Zoer’s algorithm predicted three stage wins. “Though the winners were always in the top 5 that came out of the computer, so there is an upward trend in the predictions. That’s nice. But I did make a big mistake. Carapaz was fifteenth in the model – but he won.” Zoer explains that this is because Carapaz is still a young rider. Via the internet, Zoer set a group of volunteers to work who looked up results and statistics of youth competitions. “This allowed me to take into account at an early stage a young rider like Carapaz. This element was still slightly lacking in the model. Predicting youth games is completely different anyway, these games are much more erratic. After the Giro d’Italia, I reran the model constantly with other correlations. Then I found out that if I give more weight to the results of the last three races, the results will be closer to reality.”
Zoer also predicts all the races in women’s cycling: “Actually, that’s much more fun. There are fewer statistics available and I know less about them myself. If the computer comes up with a correct prediction, I learn something new.”
#GiroRosa🇮🇹 #Algorithm #Prediction #UCIWWT
⭐️⭐️⭐️A. van Vleuten, A. van der Breggen, E. Longo Borghini
⭐️⭐️K. Niewiadoma, L. Kirchmann, A. Moolman
⭐️C. Uttrup Ludwig, A. Spratt, A. Santesteban Gonzalez, A. Pieters
This has a better sorting@AvVleuten to win the #GC pic.twitter.com/fJPyAcbNGR
— Arjan (@ZoerCyclingStat) 3 juli 2019
“That’s the nice thing about it, for me it’s a big puzzle. I want to solve it, and I’m always looking for correlations. Look, if I predict a mass sprint and it turns out to be a breakaway, then so be it. But if the results for a mountain ride are completely different from what I predict, then it’s going to be a long night. I’ll go on until I find something and then I can really lose track of time,” admits Zoer. What has to be done differently? Why isn’t it right now? What should the model take into account? “I already had to archive my Excel files, because I had over a million rules. Now I have a million and a half records of just results. I’ve already got so much from the past six weeks – results, profiles, form. It just goes on and on.”
Geo-locations and weather conditions
But Zoer would still like to work with Strava, although not in order to get hold of the wattage results from professionals. “As soon as I use those, no one will make them public anymore.” – No, Zoer wants to link this to weather conditions and air quality. “During this tour there are relatively many rides above 2000m. If I can automatically retrieve geo-locations and weather conditions via Strava, I will also be able to find correlations. Valverde is said to get worse when the air is thin. It would be nice to be able to make a statistic for that.”
Are teams able to still benefit from his algorithm? “Look at Moneyball, it also focuses purely on statistics. I’m now looking to make a database out of it that is able to be searched. But I’ve put so much work into it that I don’t really want to divulge everything about the algorithm. It seems like a great idea to have just a database where people themselves can specify what kind of rider they’re looking for, Moreover, it can also be of use to teams.”
A not-s0-surprising winner
And who should not be missing from the tour betting pool this year? “It’s very boring, but Geraint Thomas is going to win the Tour. And Kruijswijk will be third. But there are also surprises in it: Thibaut Pinot will be 25th according to the forecast. But he should be able to do much better. And I sincerely hope that Kruijswijk will win the tour too. It’s really not like I’m totally committed to the model. I don’t always agree with the computer.”
According to Zoer, these riders should not be missing in your betting pool:
1 Geraint Thomas
2. Nairo Quintana
3. Steven Kruijswijk
4. Romain Bardet
5. Adam Yates
6. Dan Martin