Leicester's Tale: Another Perspective on the EPL 2015/16 Through Expected Goals (xG) Modelling
Sheikh Badar Ud Din Tahir, Leonardo Egidi, Nicola Torelli
TL;DR
This study develops an inference-based probabilistic framework grounded in expected goals (xG) to simulate EPL 2015/16 season outcomes from shot-level data, producing season-wide distributions of points, ranks, and outcome probabilities. It compares three xG specifications and uses a Poisson process to propagate first-half xG into second-half match outcomes, enabling ex ante diagnostic signals and quantification of ranking uncertainty. The results show that xG captures the league's broad structure and identifies rare events (e.g., Leicester City’s title) as low-probability but feasible realizations under partial information, rather than deterministic predictions. Overall, xG-based simulations serve best as probabilistic baselines and early-warning tools, providing nuanced insights into team strength, performance deviation, and the likelihood of major season milestones. The framework highlights the importance of uncertainty quantification in football analytics and offers a transparent pipeline for applying xG-driven season analyses to other leagues.
Abstract
Probabilistic modeling is an effective tool for evaluating team performance and predicting outcomes in sports. However, an important question that hasn't been fully explored is whether these models can reliably reflect actual performance while assigning meaningful probabilities to rare results that differ greatly from expectations. In this study, we create an inference-based probabilistic framework built on expected goals (xG). This framework converts shot-level event data into season-level simulations of points, rankings, and outcome probabilities. Using the English Premier League 2015/16 season as a data, we demonstrate that the framework captures the overall structure of the league table. It correctly identifies the top-four contenders and relegation candidates while explaining a significant portion of the variance in final points and ranks. In a full-season evaluation, the model assigns a low probability to extreme outcomes, particularly Leicester City's historic title win, which stands out as a statistical anomaly. We then look at the ex ante inferential and early-diagnostic role of xG by only using mid-season information. With first-half data, we simulate the rest of the season and show that teams with stronger mid-season xG profiles tend to earn more points in the second half, even after considering their current league position. In this mid-season assessment, Leicester City ranks among the top teams by xG and is given a small but noteworthy chance of winning the league. This suggests that their ultimate success was unlikely but not entirely detached from their actual performance. Our analysis indicates that expected goals models work best as probabilistic baselines for analysis and early-warning diagnostics, rather than as certain predictors of rare season outcomes.
