Table of Contents
Fetching ...

Leicester's Tale: Another Perspective on the EPL 2015/16 Through Expected Goals (xG) Modelling

Sheikh Badar Ud Din Tahir, Leonardo Egidi, Nicola Torelli

TL;DR

This study develops an inference-based probabilistic framework grounded in expected goals (xG) to simulate EPL 2015/16 season outcomes from shot-level data, producing season-wide distributions of points, ranks, and outcome probabilities. It compares three xG specifications and uses a Poisson process to propagate first-half xG into second-half match outcomes, enabling ex ante diagnostic signals and quantification of ranking uncertainty. The results show that xG captures the league's broad structure and identifies rare events (e.g., Leicester City’s title) as low-probability but feasible realizations under partial information, rather than deterministic predictions. Overall, xG-based simulations serve best as probabilistic baselines and early-warning tools, providing nuanced insights into team strength, performance deviation, and the likelihood of major season milestones. The framework highlights the importance of uncertainty quantification in football analytics and offers a transparent pipeline for applying xG-driven season analyses to other leagues.

Abstract

Probabilistic modeling is an effective tool for evaluating team performance and predicting outcomes in sports. However, an important question that hasn't been fully explored is whether these models can reliably reflect actual performance while assigning meaningful probabilities to rare results that differ greatly from expectations. In this study, we create an inference-based probabilistic framework built on expected goals (xG). This framework converts shot-level event data into season-level simulations of points, rankings, and outcome probabilities. Using the English Premier League 2015/16 season as a data, we demonstrate that the framework captures the overall structure of the league table. It correctly identifies the top-four contenders and relegation candidates while explaining a significant portion of the variance in final points and ranks. In a full-season evaluation, the model assigns a low probability to extreme outcomes, particularly Leicester City's historic title win, which stands out as a statistical anomaly. We then look at the ex ante inferential and early-diagnostic role of xG by only using mid-season information. With first-half data, we simulate the rest of the season and show that teams with stronger mid-season xG profiles tend to earn more points in the second half, even after considering their current league position. In this mid-season assessment, Leicester City ranks among the top teams by xG and is given a small but noteworthy chance of winning the league. This suggests that their ultimate success was unlikely but not entirely detached from their actual performance. Our analysis indicates that expected goals models work best as probabilistic baselines for analysis and early-warning diagnostics, rather than as certain predictors of rare season outcomes.

Leicester's Tale: Another Perspective on the EPL 2015/16 Through Expected Goals (xG) Modelling

TL;DR

This study develops an inference-based probabilistic framework grounded in expected goals (xG) to simulate EPL 2015/16 season outcomes from shot-level data, producing season-wide distributions of points, ranks, and outcome probabilities. It compares three xG specifications and uses a Poisson process to propagate first-half xG into second-half match outcomes, enabling ex ante diagnostic signals and quantification of ranking uncertainty. The results show that xG captures the league's broad structure and identifies rare events (e.g., Leicester City’s title) as low-probability but feasible realizations under partial information, rather than deterministic predictions. Overall, xG-based simulations serve best as probabilistic baselines and early-warning tools, providing nuanced insights into team strength, performance deviation, and the likelihood of major season milestones. The framework highlights the importance of uncertainty quantification in football analytics and offers a transparent pipeline for applying xG-driven season analyses to other leagues.

Abstract

Probabilistic modeling is an effective tool for evaluating team performance and predicting outcomes in sports. However, an important question that hasn't been fully explored is whether these models can reliably reflect actual performance while assigning meaningful probabilities to rare results that differ greatly from expectations. In this study, we create an inference-based probabilistic framework built on expected goals (xG). This framework converts shot-level event data into season-level simulations of points, rankings, and outcome probabilities. Using the English Premier League 2015/16 season as a data, we demonstrate that the framework captures the overall structure of the league table. It correctly identifies the top-four contenders and relegation candidates while explaining a significant portion of the variance in final points and ranks. In a full-season evaluation, the model assigns a low probability to extreme outcomes, particularly Leicester City's historic title win, which stands out as a statistical anomaly. We then look at the ex ante inferential and early-diagnostic role of xG by only using mid-season information. With first-half data, we simulate the rest of the season and show that teams with stronger mid-season xG profiles tend to earn more points in the second half, even after considering their current league position. In this mid-season assessment, Leicester City ranks among the top teams by xG and is given a small but noteworthy chance of winning the league. This suggests that their ultimate success was unlikely but not entirely detached from their actual performance. Our analysis indicates that expected goals models work best as probabilistic baselines for analysis and early-warning diagnostics, rather than as certain predictors of rare season outcomes.
Paper Structure (31 sections, 7 equations, 11 figures, 6 tables)

This paper contains 31 sections, 7 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: End-to-end xG-based framework for season-level ranking simulation and inference
  • Figure 2: Teams grouped into quartiles by mid-season xG.
  • Figure 3: Residualised second-half points vs first-half xG.
  • Figure 4: Mid-season rank table realised outcomes vs underlying performance
  • Figure 5: Mid-season rank-gap diagnostic
  • ...and 6 more figures