What a EURO it has been! It’s had it all: emotions, tension, goals, own goals, records broken, controversial decisions. The EURO 2020 will definitely be remembered for years to come!
Below is another article from some of the global academic leaders in soccer analytics. After their predictions for the EURO 2020 Semi-Finals, Quarter-Finals, and Round of 16, the AUEB Sports Analytics Group make their predictions for the EURO 2020 Final.
The founding members of the AUEB Sports Analytics Group, Ioannis Ntzoufras and Dimitris Karlis, have been publishing important papers since 2003. If you are seriously interested in soccer analytics, as a professional or as a researcher, you MUST look at their work if you have not done so already. They have been a personal inspiration to me and many others as they have opened the field to many sports analytics enthusiasts. I was actually lucky enough to do my thesis with Prof. Karlis a few years back.
Recently they have published articles with their EURO 2020 predictions using advanced statistical modeling for the EURO 2020 tournament all in Greek, alongside Leonardo Egidi from the University of Trieste. They have been quite successful with their predictions in fact! Most importantly though, their work is soon to be open to an even larger crowd. With an upcoming book in soccer analytics and an R package close to its final version, these are promising times for the analytics field.
I would like to thank the authors for allowing us to republish their article. It is loosely translated from Greek to English by myself. Feel free to reach out for any questions. Enjoy!
The below article was originally published in Greek at the link here on July 11, 2021.
Predictions for the EURO 2020 Based on Football Analytics Statistic Models – The Final
Following two exciting semi-finals, the teams with the highest chances of making it to the final progressed, albeit with difficulty.
The predictions of the model for the final are given in the table and the graph that follow. Along with the probabilities for each result, the score with the highest probability (the probability is in the parenthesis) and the expected number of goals for each team (with two decimal places), are given. The expected number of goals for England is around 1 (0.91 to be precise) and 0.62 for Italy. The difference of the expected number of goals is equal to 0.29 (less than 1 and one of the smallest we have seen in EURO 2020) and indicates the small differences between the two opponents and the uncertainty for the final result.
Based on the above results, we see that the final is relatively balanced (as expected) with a marginal advantage of England over Italy. Specifically, England has a marginal advantage of winning with 41.5% probabilities, while the probability of the match going to overtime or Italy winning is equal to 58.5%. Note that in this EURO we have seen a high number of games go into overtime. This could be taken into consideration based on the corrected model that our research team added to the literature in 2003 and adds an inflated term for draws in order to correctly estimate them. This corrected model would indicate here, and based on the results that we have seen, that a draw is the most likely outcome (but again, England will have a marginal advantage).
The following graphs depict the chances for each score in each match. The darker colors indicate the most probable results while the lighter areas indicate results with lower chances. The most likely score is 0-0 (21%) followed by results such as 1-0 in favor of England (20%), 0-1 in favor of Italy (13%), and a 1-1 draw (12%). Note that the chances of predicting the final score are quite small since a prediction probability of 20% or 10% means that we would have to bet on that result in 10 games and we will predict the outcome correctly in only 2 and 1 games respectively.
Last, the graph of the defensive (in blue) and offensive (in red) capabilities of each team shows the balance between the two teams. If we carefully examine the graphs, we will see that England has both parameters slightly better than Italy, which gives them the slight advantage in the final probabilities. (Note: the lower the defensive capabilities, the better the defensive function, while larger offensive capabilities are displayed by teams with better offensive behavior).
A few words about the model
The technique and the art of statistical modeling can be directly applied to the area of athletics and specifically to soccer with direct application in making reliable predictions for future soccer games where the interest of fans increases dramatically.
The use of statistical techniques for predicting outcomes of soccer games first appeared in the scientific literature in 1968 with the pioneering scientific publication of Reep & Benjamin. The next true innovation came in the 80’s with Michael Maher’s work and the work of Lee in 1997 where he placed the question of whether Manchester United was truly the best team. The question was confirmed with the use of a simple statistical model and simulation. This analysis set the foundations of modern modelization in soccer and sports. The next important publications were the Dixon & Coles papers in 1997 and the bivariate Poisson model of Karlis and Ntzoufras in 2003 (two of the authors of this specific analysis). These two models set the foundation of modern prediction models for soccer games.
The basic idea of the statistical model of Athens University of Economics and Business professors Karlis and Ntzoufras are based on an expansion of the well-known distribution named Poisson for the prediction of the number of goals each team will score. The anticipated number of goals is written as a function of the home effect that can now be quantified and the attacking and defensive ability of the teams. Here a variation of this model is used to predict the EURO 2020 games. Moreover, time-dynamic variables that reflect the team strength and the difference in the ranking between the two opponents based on the Coca-Cola FIFA ranking on May 27th, 2021 are used. The model was estimated using the Bayesian approach with the statistical packages of R and STAN. These predictions have a similar precision to those used by betting companies.
The Magic Equations of the statistical model
- i is the game identifier
- Xi and Yi is the number of goals between Team 1 and Team 2 in game i
- home is the home effect (only for games where applicable). Usually in EURO tournaments most matches take place at a neutral arena so this bonus is not added to neither of the opposing teams
- h1i and a2i is Team 1 and Team 2 respectively (or home and away team, where applicable) for game i
- attk,t and defk,t are the parameters that estimate the attacking and defensive ability respectively of team k at time t (dynamic parameters that change throughout time)
- rankingk is the Coca-Cola FIFA ranking on May 27 2021 for team k
- γ is the effect of the Coca-Cola FIFA ranking on the log of expected goals
The predictions are made for scientific purposes and are not encouragement or advice for betting
Bibliography for fans that like to read
· Dixon, M.J. and Coles, S.G. (1997), Modelling Association Football Scores and Inefficiencies in the Football Betting Market. Journal of the Royal Statistical Society: Series C (Applied Statistics), 46, 265-280.
· Karlis, D. and Ntzoufras, I. (2003), Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician), 52, 381-393.
· Lee A.J. (1997). Modeling Scores in the Premier League: Is Manchester United Really the Best? Chance, 10, 15-19.
· Maher, M.J. (1982), Modelling association football scores. Statistica Neerlandica, 36, 109-118.
· Reep, C., & Benjamin, B. (1968). Skill and Chance in Association Football. Journal of the Royal Statistical Society. Series A (General), 131, 581-585.
A few words about the Authors
Leonardo Egidi is assistant professor of Statistics at the University of Trieste and a member of the research team of the AUEB Sports Analytics Group. He possesses a PhD in modeling and soccer analytics and has intensive research in Bayesian Statistical methodology.
Ioannis Ntzoufras is professor of Statistics and president of the Department of Statistics at Athens Univerity of Economics and Business. He is a founding member of the AUEB Sports Analytics Group research team along with Dimitris Karlis. He has recognized scientific work in subjects such as Bayesian statistical modeling, computational statistics, Biostatistics, psychometrics, and sports analytics.
Dimitris Karlis is professor of Statistics and deputy president of the Department of Statistics at Athens Univerity of Economics and Business. He is a founding member of the AUEB Sports Analytics Group research team along with Ioannis Ntzoufras. He has recognized scientific work in subjects such as statistical methodology, computational statistics, Biostatistics, and sports analytics.
The three authors of this article are currently working on writing a book on Football Analytics for an international publication while in the latest workshop of the team they gave a seminar lecture on Football Analytics.
The research team of Athens University of Economics and Business AUEB Sports Analytics Group was founded in 2015 by professors Ioannis Ntzoufras and Dimitris Karlis. Its members are important members of the sports analytics community such as Stefan Kesenne (University of Antwerp & Leuven), Leonardo Egidi (University of Trieste), Ioannis Kosmidis (Warwick), Constantinos Pelechrinis (Pittsburg), Nial Friel (UCD), and Gianluca Baio (UCL) as well as former coach of the Greek National Volleyball team Sotiris Drikos. The research team is responsible for an annual series of conferences with the nay AUEB Sports Analytics Workshop (5 in total) while in 2019 it organized the international conference MathSport 2019 with 200 participating scientists throughout the world. Last, the team has a series of important scientific publications from the field of sports analytics.