Advanced Soccer Analytics: EURO 2020 Semi-Finals Predictions (I got the Poisson, I got the Remedy)

Reading Time: 7 minutes

It’s been a fun EURO so far and personally I’ve watched (almost) every game to see if the predictions turn out true. In the quarter-finals the predictions were spot on for 3 out of 4 matches, as they nailed England’s and Denmark’s edge, as well as the balanced Spain-Switzerland match. The model was off for Belgium but with this year’s exciting Italian team I’m not surprised.

I’m already feeling the blues and will miss the break until my beloved Premier League starts. We still have a few games left though so let’s enjoy them!

Is “football coming home“, finally? Will we see a glorious 1992 repeat with Denmark? Will “Gli Azzurri” win the tournament for the first time since 1968? Or will Spain win their 3rd title in the last 4 tournaments?

Below is another article from some of the global academic leaders in soccer analytics. After their predictions for the EURO 2020 Quarter-Finals and Round of 16, re-published here and here, the AUEB Sports Analytics Group make their predictions for the Semi-Finals.

The founding members of the AUEB Sports Analytics Group, Ioannis Ntzoufras and Dimitris Karlis, have been publishing important papers since 2003. If you are seriously interested in soccer analytics, as a professional or as a researcher, you MUST look at their work if you have not done so already. They have been a personal inspiration to me and many others as they have opened the field to many sports analytics enthusiasts. I was actually lucky enough to do my thesis with Prof. Karlis a few years back.

Recently they published articles with their EURO 2020 predictions using advanced statistical modeling for the semi-finals, quarter-finals, round of 16, and group stage, all in Greek, alongside Leonardo Egidi from the University of Trieste. They have been quite successful with their predictions in fact! Most importantly though, their work is soon to be open to an even larger crowd. With an upcoming book in soccer analytics and an R package close to its final version, these are promising times for the analytics field.

I would like to thank the authors for allowing us to republish their article. It is loosely translated from Greek to English by myself. Feel free to reach out for any questions. Enjoy!

The below article was originally published in Greek at the link here on July 6, 2021.

Predictions for the EURO 2020 Based on Football Analytics Statistic Models – The Semi-Finals Stage

Italy, having surpassed the difficult Belgian obstacle, now seems to be the big favorite according to public intuition, as is confirmed by the model’s latest results

We made it to the final stretch of the EURO 2020. The Quarter-Final stage ran smoother than the previous stage and largely turned out as expected.

The Model’s Predictions for the Semi-Finals

The predictions of the model are summarized in the table that follows. Along with the probabilities for each result, the score with the highest probability (the probability is in the parenthesis) and the expected score (with two decimal places), are given. The differences of the expected number of goals are less than 1 (0.51 and 0.60 respectively) due to the small differences between the opponents.

Team 1Team 2Team 1 WinDrawTeam 2 WinMost Probable Score (Probability)Expected Score (Rounded)Qualification
ItalySpain0.4880.2690.2431-0 (0.139)1.36 - 0.85Marginal advantage for Italy
EnglandDenmark0.5050.2690.2261-0 (0.152)1.41 - 0.81Marginal advantage for England

Based on the above results we see that both games seem balanced with a marginal advantage of Italy over Spain and England over Denmark. Specifically, Italy has a slight advantage of winning against Spain with 49% winning probabilities while the chances of the match going into overtime or Spain winning is equal to 51%. Likewise, England has a slight edge with 50.5% winning probabilities against Denmark, but the chances of overtime or a Denmark win is equal to 49.5%

The following graphs depict the chances for each score in each match. The darker colors indicate the most probable results while the lighter areas indicate results with lower chances.

The following graph depicts the progression of the offensive (in red) and defensive (in blue) capabilities of each team. We see clearly that Italy and England have great defensive abilities (the respective parameter is large and equal to -0.3). On the contrary, Denmark has the best offensive strength but with the worst defensive capabilities. Last, the other 3 teams seem to be similar with regards to the offensive abilities, based on the model’s estimates.

The favorite for winning the EURO 2020

The model renders Italy the favorite to win the EURO 2020 tournament with a 34% probability, a slight edge over England who has 29% chances. Spain has 22% chances of winning EURO 2020 and Denmark is fourth with 14% chances.

A few words about the model

The technique and the art of statistical modeling can be directly applied to the area of athletics and specifically to soccer with direct application in making reliable predictions for future soccer games where the interest of fans increases dramatically.

The use of statistical techniques for predicting outcomes of soccer games first appeared in the scientific literature in 1968 with the pioneering scientific publication of Reep & Benjamin. The next true innovation came in the 80’s with Michael Maher’s work and the work of Lee in 1997 where he placed the question of whether Manchester United was truly the best team. The question was confirmed with the use of a simple statistical model and simulation. This analysis set the foundations of modern modelization in soccer and sports. The next important publications were the Dixon & Coles papers in 1997 and the bivariate Poisson model of Karlis and Ntzoufras in 2003 (two of the authors of this specific analysis). These two models set the foundation of modern prediction models for soccer games.

The basic idea of the statistical model of Athens University of Economics and Business professors Karlis and Ntzoufras are based on an expansion of the well-known distribution named Poisson for the prediction of the number of goals each team will score. The anticipated number of goals is written as a function of the home effect that can now be quantified and the attacking and defensive ability of the teams. Here a variation of this model is used to predict the EURO 2020 games. Moreover, time-dynamic variables that reflect the team strength and the difference in the ranking between the two opponents based on the Coca-Cola FIFA ranking on May 27th, 2021 are used. The model was estimated using the Bayesian approach with the statistical packages of R and STAN. These predictions have a similar precision to those used by betting companies.

The Magic Equations of the statistical model

  • i is the game identifier
  • Xi and Yi is the number of goals between Team 1 and Team 2 in game i
  • home is the home effect (only for games where applicable). Usually in EURO tournaments most matches take place at a neutral arena so this bonus is not added to neither of the opposing teams
  • λ1i and λ2i is Team 1 and Team 2 respectively (or home and away team, where applicable) for game i
  • attk,t and defk,t are the parameters that estimate the attacking and defensive ability respectively of team k at time t (dynamic parameters that change throughout time)
  • ranking is the Coca-Cola FIFA ranking on May 27 2021 for team k
  • γ/2 is the effect of the Coca-Cola FIFA ranking on the log of expected goals

The predictions are made for scientific purposes and are not encouragement or advice for betting

Bibliography for fans that like to read

·         Dixon, M.J. and Coles, S.G. (1997), Modelling Association Football Scores and Inefficiencies in the Football Betting Market. Journal of the Royal Statistical Society: Series C (Applied Statistics), 46, 265-280.

·         Karlis, D. and Ntzoufras, I. (2003), Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician), 52, 381-393.

·         Lee A.J. (1997). Modeling Scores in the Premier League: Is Manchester United Really the Best?  Chance, 10, 15-19.

·         Maher, M.J. (1982), Modelling association football scores. Statistica Neerlandica, 36, 109-118.

·         Reep, C., & Benjamin, B. (1968). Skill and Chance in Association Football. Journal of the Royal Statistical Society. Series A (General), 131, 581-585.

A few words about the Authors

 Leonardo Egidi is assistant professor of Statistics at the University of Trieste and a member of the research team of the AUEB Sports Analytics Group. He possesses a PhD in modeling and soccer analytics and has intensive research in Bayesian Statistical methodology.

Ioannis Ntzoufras is professor of Statistics and president of the Department of Statistics at Athens Univerity of Economics and Business. He is a founding member of the AUEB Sports Analytics Group research team along with Dimitris Karlis. He has recognized scientific work in subjects such as Bayesian statistical modeling, computational statistics, Biostatistics, psychometrics, and sports analytics.

Dimitris Karlis is professor of Statistics and deputy president of the Department of Statistics at Athens Univerity of Economics and Business. He is a founding member of the AUEB Sports Analytics Group research team along with Ioannis Ntzoufras. He has recognized scientific work in subjects such as statistical methodology, computational statistics, Biostatistics, and sports analytics.

The three authors of this article are currently working on writing a book on Football Analytics for an international publication while in the latest workshop of the team they gave a seminar lecture on Football Analytics.

The research team of Athens University of Economics and Business AUEB Sports Analytics Group was founded in 2015 by professors Ioannis Ntzoufras and Dimitris Karlis. Its members are important members of the sports analytics community such as Stefan Kesenne (University of Antwerp & Leuven), Leonardo Egidi (University of Trieste), Ioannis Kosmidis (Warwick), Constantinos Pelechrinis (Pittsburg), Nial Friel (UCD), and Gianluca Baio (UCL) as well as former coach of the Greek National Volleyball team Sotiris Drikos. The research team is responsible for an annual series of conferences with the nay AUEB Sports Analytics Workshop (5 in total) while in 2019 it organized the international conference MathSport 2019 with 200 participating scientists throughout the world. Last, the team has a series of important scientific publications from the field of sports analytics.

One comment

Leave a Reply