This EURO’s semi-finals contain a very impressive group of players. There are so many players that stand out from each team, each for their own reasons. Some performances come as a surprise, either good or bad, while others have played as expected.
Lucky enough, both Bill‘s beloved England team and C. Kotzias‘ favorite Denmark are in the next round. We’ve been spending our afternoon’s talking about our favorite players putting on great displays of football, only to have even more players to talk about from Italy and Spain.
The best thing we could do is to put some data science to the task of helping us understand what these players have in common.
Clustering as a technique
Clustering is a classification technique that can be used to divide players into groups. These groups, aka clusters, contain similar players and are different to each other. It is an unsupervised learning method that can help us recognize the natural groups that appear. There are plenty of great resources on the web with more info and I recommend that anyone interested in data analysis, not just in sports but in sciences and business, should master this technique.
Feel free to contact us for any questions or tips.
Our clustering method
Using data from fbref.com, we selected the below stat categories for clustering. Remember, you can pull data off of fbref.com in just a few minutes once you follow our beginners tutorial.
We then applied agglomerative hierarchical clustering to build our hierarchy of clusters, using the Ward minimum deviance method.
Please note that due to the abundance of articles from statistics experts on the web, we do not go into detail here on the specific techniques. We are always open for ideas though and many of our posts are a result of our readers’ and followers’ requests. Do not hesitate to reach out for more info!
Category | Description |
Goals | Goals |
xG | Expected Goals xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). Provided by StatsBomb. An underline indicates there is a match that is missing data, but will be updated when available. |
Shots | Does not include penalty kicks |
SoT | Shots on target Note: Shots on target do not include penalty kicks |
SCA | Shot-Creating Actions The two offensive actions directly leading to a shot, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit. |
GCA | Goal-Creating Actions The two offensive actions directly leading to a goal, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit. |
Assists | Assists |
xA | xG Assisted xG which follows a pass that assists a shot Provided by StatsBomb. |
Pass.Comp | Passes Completed |
Key.Pass | Passes that directly lead to a shot (assisted shots) |
Pass.Final.3rd | Completed passes that enter the 1/3 of the pitch closest to the goal Not including set pieces |
SuccDrib | Dribbles Completed Successfully |
Carries | Number of times the player controlled the ball with their feet |
Touches | Number of times a player touched the ball. Note: Receiving a pass, then dribbling, then sending a pass counts as one touch |
Drib.Dist | Total distance, in yards, a player moved the ball while controlling it with their feet, in any direction |
Pass.Target | Number of times a player was the target of an attempted pass |
Blocks | Number of times blocking the ball by standing in its path |
Interceptions | Interceptions |
Tackles | Number of players tackled |
Pressure | Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball |
Player clusters
The graph below contains the full cluster tree of EURO 2020 players of the teams in the semi-finals, i.e. Italy, Spain, England, and Denmark.
Further below, we have a description for each cluster. We also include the radial plots for the average player in each cluster. The CHI values in the radial plots stand for Cluster Heterogeneity Index. They take values between 0 and 1. The higher the value, the more the players in the cluster have in common.
Cluster 1 – How I Wish You Were Here
A lot of articles have been written about the players that have disappointed this year. The cluster that we present is full of good players that have underperformed.
Some teams like England started the tournament thinking that they have enough firepower to dominate. But players like Marcus Rashford and Phil Foden never got it going. Then it was Saka that was given the chance to shine, at the expense of Jadon Sancho. Yet again, another player of their young talents is having a mediocre tournament. Denmark and Spain can boast that none of their starters feature in this list but for the Italians Federico Bernadeschi, Andrea Belotti, and especially Giorgio Chiellini should have done better.
With a Cluster Homogeneity Index of 0.14, the players in this cluster vary in stats between each other and there is a low level of similarity.


Cluster 2 – It’s Raining Goals, Hallellujah
Now this is a cluster that people can relate to and is certainly easier to read. We are looking for goalscorers, the players that have the most chances to score. Furthermore, these players are doing excellent in other fields such as assists, dribbles and even applying pressure to opponents.
So what is Joakim Maehle doing on this list? In a few words, he is proving his dominance in every field, I personally have been impressed and can’t stop writing about him.
Raheem Sterling and Harry Kane are England’s primary threats, in a similar fashion to Italy’s Lorenzo Insigne and Ciro Immobile. These teams rely on their goalscorers more than they would have liked but if it is working for them who are we to judge?
Spain on the other side finds goals from different players, a characteristic that makes it more difficult for their opponents to defend against them.
With a CHI of 0.74, these 8 players have quite a lot in common.


Cluster 3 – You Shall Not Pass!
If you feature in this cluster, you are excellent at defending with the ability to carry and pass the ball.
It comes with no surprise that Delaney, Christensen, and Kjaer are part of this cluster. All three have been in excellent form for Denmark contributing greatly to their success.
Azpilicueta and Busquets are similarly very important to Spain, as they put a lot of work into carrying and passing the ball to forward positions while also tending to their defensive responsibilities.
Declan Rice has exceeded everyone’s expectations this year being a massive help to his team so far.
With a CHI of 0.35, the players in this group have quite a few differences between each other.


Cluster 4 – It’s All Under Control
This cluster presents players that do a lot of different jobs in the field, while none could be considered eye-catching. Players like these are excellent game in game out and their performances tend to be overlooked by the fans.
Fans do not appreciate the player that carries or touches the ball without doing an extra something, be it an assist, a goal, or an impressive dribble. However, for us and especially for their coaches, a player that can dominate their area of control, contribute massively in possession while also defending like a central defender would is a pivotal piece in every successful team.
Koke and Jorginho are the perfect examples of hard workers that would make any coach in the game delighted to have.
This cluster has a CHI of 0.82 and they are all very similar players.


Cluster 5 – Puff Puff Pass
Next, we move to some of the semi-finals’ finest passers. These players are ready to carve a ball to the back of the defender at any given time. Their main function is to create, thus they will try to support every attacking effort, some being primary creative forces while others utilize their abilities on that front while they also excel on others.
Pierre Hojbjerg, Marco Verratti, Nicolo Barella, and Kalvin Phillips are some of the best examples of good passers that have an eye to make that final pass their team players ask for. Jack Grealish proved in no time why he should have been playing since day one- he knows how to deliver assists better than any other player in England’s roster.
Another player that impressed is Spinazzola but he has been unlucky in his last game and we won’t see him again for many months. We wish him to have a speedy recovery!
Cluster 5 has a CHI of 0.71, which indicates that this group of players has a good level of similarity.


Cluster 6 – I Tried So Hard and Got So Far (but in the end, it doesn’t even matter)
In our second cluster we had players that scored a lot of goals while also contributing in many other areas. This cluster features players that have been underwhelming in almost everything other than pursuing goalscoring opportunities.
While it is surprising to find Matteo Pessina and Manuel Locatelli in this cluster considering their roles on the field, it is rather logical to see Pablo Sarabia, Kasper Dolberg, and Federico Chiesa.
England with their few goalscorers doesn’t have a single player in this cluster. That is because Kane and Sterling have been the ones chasing a goal the most while they have also been occupied with other responsibilities on the field.
Yussuf Poulsen and Dolberg on the other side seem to have one and only role in the field, to put the ball in the net.
With a CHI of 0.35, the players in this cluster seem to have their own unique features.

