Advanced Soccer Analytics: EURO 2020 Quarter-Finals Predictions (Choose Your Poisson)

Reading Time: 7 minutes

And the scientific predictions keep coming! We’re in the final lap of the EURO 2020 and in 10 days we will know the winner of this exciting tournament. For those, like me, curious to know what may happen, the Bivariate Poisson model is here!

Below is another article from some of the global academic leaders in soccer analytics. After their predictions for the EURO 2020 Round of 16, re-published here, the AUEB Sports Analytics Group make their predictions for the Quarter-Finals.

The founding members of the AUEB Sports Analytics Group, Ioannis Ntzoufras and Dimitris Karlis, have been publishing important papers since 2003. If you are seriously interested in soccer analytics, as a professional or as a researcher, you MUST look at their work if you have not done so already. They have been a personal inspiration to me and many others as they have opened the field to many sports analytics enthusiasts. I was actually lucky enough to do my thesis with Prof. Karlis a few years back.

Recently they published articles with their EURO 2020 predictions using advanced statistical modeling here, here, and here, all in Greek, alongside Leonardo Egidi from the University of Trieste. They have been quite successful with their predictions in fact! Most importantly though, their work is soon to be open to an even larger crowd. With an upcoming book in soccer analytics and an R package close to its final version, these are promising times for the analytics field.

I would like to thank the authors for allowing us to republish their article. It is loosely translated from Greek to English by myself. Feel free to reach out for any questions. Enjoy!

The below article was originally published in Greek at the link here on July 2, 2021.

Predictions for the EURO 2020 Based on Football Analytics Statistic Models – The Quarter-Finals Stage

The AUEB Sports Analytics Group research team makes predictions for the EURO 2020 Quarter-Finals

The EURO 2020 is turning out as the tournament of surprises and high scores.

The model managed to successfully predict the good and efficient performance of Belgium and England.

It also predicted that the match between Spain and Croatia would be balanced (with a slight advantage in favor of Spain, as it turned out in overtime).

Italy had a hard time but eventually progressed, while in the other knockout matches we had surprises that were difficult to be predicted by a model or experts of the sport.

The details and definition of the model are given at the end of this article.

The predictions for the round of 16 can be found here in Greek (or here in English).

The Model’s Predictions for the Quarter-Finals

The predictions of the model are summarized in the table that follows. Along with the probabilities for each result, the score with the highest probability (the probability is in the parenthesis) and the expected score, rounded to the nearest integer, are given.

Based on the above results we see that all games seem balanced with the teams being close. Belgium has a marginal edge over Italy with 54% chances of winning, while the probability that the game will go into overtime or that Italy will win is very close and equal to 46%. We have similar predictions for the match between Ukraine and England. England has a marginal edge with 56% chances to win while Ukraine can take the game into overtime or win with 44% total probabilities.

The match between Denmark and the Czech Republic seems to be more balanced than the two previous games with the probability of a Danish win being equal to 46% and the chances of overtime or a Czech win is 54% in total.

Last, the most balanced game is Switzerland-Spain with Spain having a slight edge with 39.7% chances of winning (the prediction image is similar to that of the match Croatia-Spain). To conclude, all matches seem to not have a clear winner and everything is possible especially in a tournament like this year’s.

The following graphs depict the chances for each score in each match. The darker colors indicate the most probable results while the lighter areas indicate results with lower chances.

The predictions are made for scientific purposes and are not encouragement or advice for betting

Bibliography for fans that like to read

·         Dixon, M.J. and Coles, S.G. (1997), Modelling Association Football Scores and Inefficiencies in the Football Betting Market. Journal of the Royal Statistical Society: Series C (Applied Statistics), 46, 265-280.

·         Karlis, D. and Ntzoufras, I. (2003), Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician), 52, 381-393.

·         Lee A.J. (1997). Modeling Scores in the Premier League: Is Manchester United Really the Best?  Chance, 10, 15-19.

·         Maher, M.J. (1982), Modelling association football scores. Statistica Neerlandica, 36, 109-118.

·         Reep, C., & Benjamin, B. (1968). Skill and Chance in Association Football. Journal of the Royal Statistical Society. Series A (General), 131, 581-585.

The identity of the model

The technique and the art of statistical modeling can be directly applied to the area of athletics and specifically to soccer with direct application in making reliable predictions for future soccer games where the interest of fans increases dramatically.

The use of statistical techniques for predicting outcomes of soccer games first appeared in the scientific literature in 1968 with the pioneering scientific publication of Reep & Benjamin. The next true innovation came in the 80’s with Michael Maher’s work and the work of Lee in 1997 where he placed the question of whether Manchester United was truly the best team. The question was confirmed with the use of a simple statistical model and simulation. This analysis set the foundations of modern modelization in soccer and sports. The next important publications were the Dixon & Coles papers in 1997 and the bivariate Poisson model of Karlis and Ntzoufras in 2003 (two of the authors of this specific analysis). These two models set the foundation of modern prediction models for soccer games.

The basic idea of the statistical model of Athens University of Economics and Business professors Karlis and Ntzoufras are based on an expansion of the well-known distribution named Poisson for the prediction of the number of goals each team will score. The anticipated number of goals is written as a function of the home effect that can now be quantified and the attacking and defensive ability of the teams. Here a variation of this model is used to predict the EURO 2020 games. Moreover, time-dynamic variables that reflect the team strength and the difference in the ranking between the two opponents based on the Coca-Cola FIFA ranking on May 27th, 2021 are used. The model was estimated using the Bayesian approach with the statistical packages of R and STAN. These predictions have a similar precision to those used by betting companies.

The Magic Equations of the statistical model

  • i is the game identifier
  • Xi and Yi is the number of goals between Team 1 and Team 2 in game i
  • home is the home effect (only for games where applicable). Usually in EURO tournaments most matches take place at a neutral arena so this bonus is not added to neither of the opposing teams
  • λ1i and λ2i is Team 1 and Team 2 respectively (or home and away team, where applicable) for game i
  • attk,t and defk,t are the parameters that estimate the attacking and defensive ability respectively of team k at time t (dynamic parameters that change throughout time)
  • ranking is the Coca-Cola FIFA ranking on May 27 2021 for team k
  • γ/2 is the effect of the Coca-Cola FIFA ranking on the log of expected goals

A few words about the Authors

 Leonardo Egidi is assistant professor of Statistics at the University of Trieste and a member of the research team of the AUEB Sports Analytics Group. He possesses a PhD in modeling and soccer analytics and has intensive research in Bayesian Statistical methodology.

Ioannis Ntzoufras is professor of Statistics and president of the Department of Statistics at Athens Univerity of Economics and Business. He is a founding member of the AUEB Sports Analytics Group research team along with Dimitris Karlis. He has recognized scientific work in subjects such as Bayesian statistical modeling, computational statistics, Biostatistics, psychometrics, and sports analytics.

Dimitris Karlis is professor of Statistics and deputy president of the Department of Statistics at Athens Univerity of Economics and Business. He is a founding member of the AUEB Sports Analytics Group research team along with Ioannis Ntzoufras. He has recognized scientific work in subjects such as statistical methodology, computational statistics, Biostatistics, and sports analytics.

The three authors of this article are currently working on writing a book on Football Analytics for an international publication while in the latest workshop of the team they gave a seminar lecture on Football Analytics.

The research team of Athens University of Economics and Business AUEB Sports Analytics Group was founded in 2015 by professors Ioannis Ntzoufras and Dimitris Karlis. Its members are important members of the sports analytics community such as Stefan Kesenne (University of Antwerp & Leuven), Leonardo Egidi (University of Trieste), Ioannis Kosmidis (Warwick), Constantinos Pelechrinis (Pittsburg), Nial Friel (UCD), and Gianluca Baio (UCL) as well as former coach of the Greek National Volleyball team Sotiris Drikos. The research team is responsible for an annual series of conferences with the nay AUEB Sports Analytics Workshop (5 in total) while in 2019 it organized the international conference MathSport 2019 with 200 participating scientists throughout the world. Last, the team has a series of important scientific publications from the field of sports analytics.


Leave a Reply