Table of Contents
Fetching ...

Modeling and Prediction of the UEFA EURO 2024 via Combined Statistical Learning Approaches

Andreas Groll, Lars M. Hvattum, Christophe Ley, Jonas Sternemann, Gunther Schauberger, Achim Zeileis

TL;DR

Three fundamentally different machine learning models are combined to create a new, joint model for forecasting the UEFA EURO 2024, which identifies France as the clear favourite with a winning probability of 19.2%, followed by England, Germany and host Germany.

Abstract

In this work, three fundamentally different machine learning models are combined to create a new, joint model for forecasting the UEFA EURO 2024. Therefore, a generalized linear model, a random forest model, and a extreme gradient boosting model are used to predict the number of goals a team scores in a match. The three models are trained on the match results of the UEFA EUROs 2004-2020, with additional covariates characterizing the teams for each tournament as well as three enhanced variables derived from different ranking methods for football teams. The first enhanced variable is based on historic match data from national teams, the second is based on the bookmakers' tournament winning odds of all participating teams, and the third is based on historic match data of individual players both for club and international matches, resulting in player ratings. Then, based on current covariate information of the participating teams, the final trained model is used to predict the UEFA EURO 2024. For this purpose, the tournament is simulated 100.000 times, based on the estimated expected number of goals for all possible matches, from which probabilities across the different tournament stages are derived. Our combined model identifies France as the clear favourite with a winning probability of 19.2%, followed by England (16.7%) and host Germany (13.7%).

Modeling and Prediction of the UEFA EURO 2024 via Combined Statistical Learning Approaches

TL;DR

Three fundamentally different machine learning models are combined to create a new, joint model for forecasting the UEFA EURO 2024, which identifies France as the clear favourite with a winning probability of 19.2%, followed by England, Germany and host Germany.

Abstract

In this work, three fundamentally different machine learning models are combined to create a new, joint model for forecasting the UEFA EURO 2024. Therefore, a generalized linear model, a random forest model, and a extreme gradient boosting model are used to predict the number of goals a team scores in a match. The three models are trained on the match results of the UEFA EUROs 2004-2020, with additional covariates characterizing the teams for each tournament as well as three enhanced variables derived from different ranking methods for football teams. The first enhanced variable is based on historic match data from national teams, the second is based on the bookmakers' tournament winning odds of all participating teams, and the third is based on historic match data of individual players both for club and international matches, resulting in player ratings. Then, based on current covariate information of the participating teams, the final trained model is used to predict the UEFA EURO 2024. For this purpose, the tournament is simulated 100.000 times, based on the estimated expected number of goals for all possible matches, from which probabilities across the different tournament stages are derived. Our combined model identifies France as the clear favourite with a winning probability of 19.2%, followed by England (16.7%) and host Germany (13.7%).

Paper Structure

This paper contains 21 sections, 23 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Bar plot showing the variable importance of a Cforest model (with mtry$=4$) trained on UEFA EURO 2004-2020 data.
  • Figure 2: Plot showing the time weight of a match played $x_m$ days ago obtained through the $w_{time,m}(x_m)$ function for a Half period of three years (i.e. $3*365.25 = 1095.75$ days).
  • Figure 3: Exemplary ctree created on one single bootstrap sample of our training data, i.e. all matches from the EUROs 2004-2020.
  • Figure 4: $MAE_{diff}$ obtained through permutation.
  • Figure 5: Bar plot displaying the variable importance of the final fitted Cforest model used in the combined model.