# Glicko rating system: A new method of evaluating sprinting performance

It is difficult to come to an agreement regarding who is the best sprinter of the current peloton. One rider wins a couple of bunch sprints and cycling fans and press express their admiration towards him. Then, while he takes a deserved break at some point of the season, another rider may get the focus of attention and suddenly the debate opens up again.

In order to rank riders in an objective way, taking into account past results and opponents' quality, I applied the Glicko rating system that is already present in sports like tennis or chess. With this approach, I treated each bunch sprint in 2021 as a game between players with a rating that reflects their results in previous matches. After each game, their ratings are adjusted according to the outcome of the contest.

Until now, I had simply relied on basic statistics like win percentage or average rank to assign a number to the performance of a sprinter, but it had its obvious limitations. How many riders contested that sprint? Was the level of the opponents any good? Someone could win a mediocre race and appear high on those lists without really doing anything out of the ordinary, so I researched about rating systems used in other sports and came across the Glicko rating.

The Glicko rating was invented by Mark Glickman as an improvement on the Elo rating system. It solves some simplification issues of the previous solution, and incorporates additional parameters that increase the reliability of the score. One way or another I had to build this rating system from scratch, so I went for the improved version.

Instead of ranking riders based on a single **rating **value, when applying the Glicko rating system, it is important to take into consideration their **rating deviation** (RD) too. It measures the uncertainty in a rating, distinguishing players that have reached a similar rating in different ways, mainly in the number of games contested.** **Both values should be combined into a 95% confidence interval to measure true skill.

This RD plays an important role in updating the sprinter’s overall ability. The change is smaller if the prior RD is low, because the rating is already considered to be accurate, or if the RD of the other contender is high, as the true level of that rider is still unknown.

Another measure included in the glicko2 calculations is the **volatility**, which indicates the consistency level of the player. This value of this measure is low if the player gets similar results in most of the matches he takes part in.

In the beginning, it is common practice to assign a rating of 1.500 and a rating deviation of 350 to all the players, and start updating from there. I retrieved the ratings at the end of each month to analyze the evolution of all the riders I have been tracking this year.

**Practical implementation**

The Glicko rating system is mainly applied in a 1 vs 1 scenario like a chess or tennis match, and I could not find an out-of-the-box solution that could be applied to the particularities of cycling in a straightforward way. I started from the glicko2 python package and did some adaptations to improve the coding.

Rider’ ratings depend on the ratings of their opponents and the results scored against them. To determine the exact amount of points a rider would win or lose after a sprint, several complex mathematical calculations are needed.

For every “game” or bunch sprint, we need to know the prior rating and RD of all the contenders, and the result of each 1vs1 clash between them. This result has a value of 1 if the rider whose rating is being updated finished ahead of his opponent, or 0 if he finished after him.

Then, through several steps, and taking into account the expected result and the opponent’s RD, we compute the estimated variance of the player’s rating, the estimated improvement in rating, and the new volatility value. Once we have this, we update the rating and RD of the player.

Groenewegen.update_player([x for x in [Cavendish.rating, Nizzolo.rating, DeKleijn.rating, Bol.rating, Pedersen.rating]],

[x for x in [Cavendish.rd, Nizzolo.rd, DeKleijn.rd, Bol.rd, Pedersen.rd]],[1,1,1,1,1])Cavendish.update_player([x for x in [Groenewegen.rating, Nizzolo.rating, DeKleijn.rating, Bol.rating, Pedersen.rating]],

[x for x in [Groenewegen.rd, Nizzolo.rd, DeKleijn.rd, Bol.rd, Pedersen.rd]],[0,1,1,1,1])Nizzolo.update_player([x for x in [Cavendish.rating, Groenewegen.rating, DeKleijn.rating, Bol.rating, Pedersen.rating]],

[x for x in [Cavendish.rd, Groenewegen.rd, DeKleijn.rd, Bol.rd, Pedersen.rd]],[0,0,1,1,1])DeKleijn.update_player([x for x in [Cavendish.rating, Nizzolo.rating, Groenewegen.rating, Bol.rating, Pedersen.rating]],

[x for x in [Cavendish.rd, Nizzolo.rd, Groenewegen.rd, Bol.rd, Pedersen.rd]],[0,0,0,1,1])Bol.update_player([x for x in [Cavendish.rating, Nizzolo.rating, DeKleijn.rating, Groenewegen.rating, Pedersen.rating]],

[x for x in [Cavendish.rd, Nizzolo.rd, DeKleijn.rd, Groenewegen.rd, Pedersen.rd]],[0,0,0,0,1])Pedersen.update_player([x for x in [Cavendish.rating, Nizzolo.rating, DeKleijn.rating, Bol.rating, Groenewegen.rating]],

[x for x in [Cavendish.rd, Nizzolo.rd, DeKleijn.rd, Bol.rd, Groenewegen.rd]],[0,0,0,0,0])

In this example, 6 riders contested the win of the 1st stage of the Tour of Denmark. Retrieving the prior rating and RD of the contenders, we update the player’s values to get their new true skill rating. This needed to be made for every single bunch sprint contested this season, so I had to invest a decent amount of time on it.

As you may guess, the RD decreases after each sprint contested, showing that the more races a rider takes part in (the more information we have about the rider’s ability), the more precise his rating is. Someone who has contested just a couple of sprints since his debut but finished in the top 3 in both will see his rating greatly increased, but his RD will still be high. In order to keep a good rating with a lower RD, he will have to prove on future occasions that it was not only by chance.

**Scenario checking**

Inside the Glicko rating calculations we find the ‘E’ function, which can be interpreted as the probability of a certain rider finishing ahead of his opponent. It will always be greater than 0 and less than 1, depending on the difference between the ratings of the riders.

This value is helpful to understand the reasoning behind the rating changes, and hopefully, with these two examples, you will be able to understand the process behind it anytime I mention the Glicko rating of a rider from now on.

The 2nd stage of the Giro was the first chance for the sprinters that started the race the day before in Torino, and Merlier got the win for his team Alpecin-Fenix on their debut in a Grand Tour. Nine of the riders that I tracked during the year contested that sprint, some of them performing as expected but others contradicting the Glicko rating they had before the race.

It was Groenewegen’s first race of the season after his sanction, so his rating was the default one. According to his Glicko rating, Ewan was the favorite to take the win, with a favorable expected result in each of the 1vs1 clashes against the rest of the contenders. He wasn’t able to really fight for the stage, so his Glicko rating decreased notably.

As more data is collected and riders take part in more sprints, they will reach a consistent rating that should only be affected by results out of the ordinary, like a second-class sprinter that suddenly wins a race in which top riders were on the startlist.

Late in the season, Philipsen got an “easy win” in the Classique Paris-Chauny, his 4th victory in 10 days. He was the main rider in the race and he delivered, which was undeniable an important individual result, but it did not have a big effect on his Glicko rating. In line with this reasoning, his rivals lost against a top-class rider, so the change in their rating after the race came mainly from the outcome of their 1vs1 clashes against the other riders.

**Glicko Ratings after the 2021 season**

After filling all the necessary lines of code, it was time to see if the output of Glicko rating system at the end of the season was in line with my intuition.

Bennett is rated as the best sprinter of the season, although he hasn’t finished a race since May. He reached this rating by performing at a high level in each of the 10 sprints he contested, even when the level of his opponents was high.

Jakobsen won a lot too, but because some of his wins were against low-rated riders, he did not manage to reach the number of points of Bennett.

Ahead of the 2022 season, the RD of all riders will be manually updated to account for the uncertainty of a new season (some riders changing teams, others racing after a long break), and neoprofessionals with a world-class potential will be included.

**Remarks**

I’m confident I have taken into account all sprinters that will contest the main bunch finishes in the years to come, but it is possible that someone who is now under the radar suddenly starts sprinting like the best. That would not be ideal, as his past results would need to be included, with the subsequent modifications to the code it implies.

Data was manually inserted following a chronological order, starting from the Grand Prix Cycliste la Marseillaise in January to the last UCI .1 race of the year.

There are no fixed rules, but I consider a bunch sprint to be a sprint finish in which at least 20 riders contest the win. Also, if a rider was the lead-out man of a teammate rather than the designated sprinter of the team, I did not take him into account for that race.

Although this article will not reveal the best sprinter of all time, it can be the starting point to a quantifiable way of evaluating sprinting performance. Cycling cannot be considered a zero-sum game, but this particular application to bunch sprints can be considered valuable for different purposes. Additionally, as long as physiological data continues being inaccessible for public use, I will continue trying to make the most of that it is available.