Table of Contents
Fetching ...

Beyond Expected Goals: A Probabilistic Framework for Shot Occurrences in Soccer

Jonathan Pipping, Tianshu Feng, R. Paul Sabin

TL;DR

This work extends traditional expected goals (xG) by introducing xG+, a possession-level framework that jointly models shot-creation and shot-conversion probabilities. By aggregating across a possession, xG+ accounts for near-misses and sequential attack dynamics, addressing the conditioning-on-shots limitation of standard xG. Using Gradient Sports EPL tracking data and XGBoost, the authors show that xG+ improves team-level predictions and yields more persistent player signals than xG alone. The study provides insights into feature importance, validates across seasons, and outlines future directions including sequence modeling and defensive credit.

Abstract

Expected goals (xG) models estimate the probability that a shot results in a goal from its context (e.g., location, pressure), but they operate only on observed shots. We propose xG+, a possession-level framework that first estimates the probability that a shot occurs within the next second and its corresponding xG if it were to occur. We also introduce ways to aggregate this joint probability estimate over the course of a possession. By jointly modeling shot-taking behavior and shot quality, xG+ remedies the conditioning-on-shots limitation of standard xG. We show that this improves predictive accuracy at the team level and produces a more persistent player skill signal than standard xG models.

Beyond Expected Goals: A Probabilistic Framework for Shot Occurrences in Soccer

TL;DR

This work extends traditional expected goals (xG) by introducing xG+, a possession-level framework that jointly models shot-creation and shot-conversion probabilities. By aggregating across a possession, xG+ accounts for near-misses and sequential attack dynamics, addressing the conditioning-on-shots limitation of standard xG. Using Gradient Sports EPL tracking data and XGBoost, the authors show that xG+ improves team-level predictions and yields more persistent player signals than xG alone. The study provides insights into feature importance, validates across seasons, and outlines future directions including sequence modeling and defensive credit.

Abstract

Expected goals (xG) models estimate the probability that a shot results in a goal from its context (e.g., location, pressure), but they operate only on observed shots. We propose xG+, a possession-level framework that first estimates the probability that a shot occurs within the next second and its corresponding xG if it were to occur. We also introduce ways to aggregate this joint probability estimate over the course of a possession. By jointly modeling shot-taking behavior and shot quality, xG+ remedies the conditioning-on-shots limitation of standard xG. We show that this improves predictive accuracy at the team level and produces a more persistent player skill signal than standard xG models.

Paper Structure

This paper contains 20 sections, 6 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: A comparison of a shot with low goal probability and a cross with a much higher goal probability that never became a shot.
  • Figure 2: Kylian Mbappé demonstrates his ability to create high-quality shots by making his man miss against Manchester City (Feb 19, 2025). Traditional xG metrics only consider the probability of a goal once he shoots, which is why some elite goal-scorers fail to consistently outperform their xG. Their xG is high because they created better chances!
  • Figure 3: xS feature importance based on information gain
  • Figure 4: xG feature importance based on information gain
  • Figure 5: Partial dependence plots (PDPs) for key variables affecting xS.
  • ...and 3 more figures