Table of Contents
Fetching ...

Bayes-xG: Player and Position Correction on Expected Goals (xG) using Bayesian Hierarchical Approach

Alexander Scholtes, Oktay Karakuş

TL;DR

This work investigates whether player identity and playing position alter shot-to-goal probabilities beyond traditional shot features by applying Bayesian hierarchical logistic regression to football xG data. It compares baseline frequentist xG and a StatsBomb benchmark, then builds three Bayes-xG variants to separate position and player effects, revealing that position effects largely disappear once detailed shot-context predictors are included, while player-specific xG adjustments persist across leagues. The study analyzes Premier League data and extends to La Liga and the Bundesliga, demonstrating cross-league generalization and highlighting the impact of prior choices on sampling efficiency. The results offer practical insights for scouting and performance evaluation by quantifying how individual players differ in finishing ability beyond contextual shot factors, and they discuss methodological implications for priors in complex hierarchical models.

Abstract

This study employs Bayesian methodologies to explore the influence of player or positional factors in predicting the probability of a shot resulting in a goal, measured by the expected goals (xG) metric. Utilising publicly available data from StatsBomb, Bayesian hierarchical logistic regressions are constructed, analysing approximately 10,000 shots from the English Premier League to ascertain whether positional or player-level effects impact xG. The findings reveal positional effects in a basic model that includes only distance to goal and shot angle as predictors, highlighting that strikers and attacking midfielders exhibit a higher likelihood of scoring. However, these effects diminish when more informative predictors are introduced. Nevertheless, even with additional predictors, player-level effects persist, indicating that certain players possess notable positive or negative xG adjustments, influencing their likelihood of scoring a given chance. The study extends its analysis to data from Spain's La Liga and Germany's Bundesliga, yielding comparable results. Additionally, the paper assesses the impact of prior distribution choices on outcomes, concluding that the priors employed in the models provide sound results but could be refined to enhance sampling efficiency for constructing more complex and extensive models feasibly.

Bayes-xG: Player and Position Correction on Expected Goals (xG) using Bayesian Hierarchical Approach

TL;DR

This work investigates whether player identity and playing position alter shot-to-goal probabilities beyond traditional shot features by applying Bayesian hierarchical logistic regression to football xG data. It compares baseline frequentist xG and a StatsBomb benchmark, then builds three Bayes-xG variants to separate position and player effects, revealing that position effects largely disappear once detailed shot-context predictors are included, while player-specific xG adjustments persist across leagues. The study analyzes Premier League data and extends to La Liga and the Bundesliga, demonstrating cross-league generalization and highlighting the impact of prior choices on sampling efficiency. The results offer practical insights for scouting and performance evaluation by quantifying how individual players differ in finishing ability beyond contextual shot factors, and they discuss methodological implications for priors in complex hierarchical models.

Abstract

This study employs Bayesian methodologies to explore the influence of player or positional factors in predicting the probability of a shot resulting in a goal, measured by the expected goals (xG) metric. Utilising publicly available data from StatsBomb, Bayesian hierarchical logistic regressions are constructed, analysing approximately 10,000 shots from the English Premier League to ascertain whether positional or player-level effects impact xG. The findings reveal positional effects in a basic model that includes only distance to goal and shot angle as predictors, highlighting that strikers and attacking midfielders exhibit a higher likelihood of scoring. However, these effects diminish when more informative predictors are introduced. Nevertheless, even with additional predictors, player-level effects persist, indicating that certain players possess notable positive or negative xG adjustments, influencing their likelihood of scoring a given chance. The study extends its analysis to data from Spain's La Liga and Germany's Bundesliga, yielding comparable results. Additionally, the paper assesses the impact of prior distribution choices on outcomes, concluding that the priors employed in the models provide sound results but could be refined to enhance sampling efficiency for constructing more complex and extensive models feasibly.
Paper Structure (17 sections, 8 equations, 15 figures, 8 tables)

This paper contains 17 sections, 8 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Relationships between shot angle (binned in the 20s)/distance to goal (binned in 10s) and the proportion of goals from shots.
  • Figure 2: Distributions of Predictions from Frequentist xG Models
  • Figure 3: Model fitting performance when increasing the number of features.
  • Figure 4: Distributions of xG Adjustments by Position of Bayes-xG$_1$.
  • Figure 5: Normalized Heatmap of Shot Locations by General Position.
  • ...and 10 more figures