Table of Contents
Fetching ...

Automated Explanation of Machine Learning Models of Footballing Actions in Words

Pegah Rahimian, Jernej Flisar, David Sumpter

TL;DR

This work addresses the gap between machine-learning predictions for football and coaching staff explanations. It combines a logistic-regression $xG$ model with a Wordalisation layer that converts feature contributions into natural-language narratives via LLM prompts, using percentile-based categorization for $xG$ and its features. In practice, the model uses $P(y=1|x) = 1/(1+e^{-log-odds})$ with $log-odds = beta0 + \sum_j beta_j x_j$, and contributions are $Contribution_j = beta_j * (x_j - mu_j)$. The authors provide a model card and an open-source Streamlit app to demonstrate explanations and discuss extensions to other football actions and real-world coaching use.

Abstract

While football analytics has changed the way teams and analysts assess performance, there remains a communication gap between machine learning practice and how coaching staff talk about football. Coaches and practitioners require actionable insights, which are not always provided by models. To bridge this gap, we show how to build wordalizations (a novel approach that leverages large language models) for shots in football. Specifically, we first build an expected goals model using logistic regression. We then use the co-efficients of this regression model to write sentences describing how factors (such as distance, angle and defensive pressure) contribute to the model's prediction. Finally, we use large language models to give an entertaining description of the shot. We describe our approach in a model card and provide an interactive open-source application describing shots in recent tournaments. We discuss how shot wordalisations might aid communication in coaching and football commentary, and give a further example of how the same approach can be applied to other actions in football.

Automated Explanation of Machine Learning Models of Footballing Actions in Words

TL;DR

This work addresses the gap between machine-learning predictions for football and coaching staff explanations. It combines a logistic-regression model with a Wordalisation layer that converts feature contributions into natural-language narratives via LLM prompts, using percentile-based categorization for and its features. In practice, the model uses with , and contributions are . The authors provide a model card and an open-source Streamlit app to demonstrate explanations and discuss extensions to other football actions and real-world coaching use.

Abstract

While football analytics has changed the way teams and analysts assess performance, there remains a communication gap between machine learning practice and how coaching staff talk about football. Coaches and practitioners require actionable insights, which are not always provided by models. To bridge this gap, we show how to build wordalizations (a novel approach that leverages large language models) for shots in football. Specifically, we first build an expected goals model using logistic regression. We then use the co-efficients of this regression model to write sentences describing how factors (such as distance, angle and defensive pressure) contribute to the model's prediction. Finally, we use large language models to give an entertaining description of the shot. We describe our approach in a model card and provide an interactive open-source application describing shots in recent tournaments. We discuss how shot wordalisations might aid communication in coaching and football commentary, and give a further example of how the same approach can be applied to other actions in football.

Paper Structure

This paper contains 14 sections, 5 equations, 8 figures.

Figures (8)

  • Figure 1: Overview of the proposed workflow, comprising the data pipeline for feature extraction, model training, and feature contribution analysis, along with the wordalisation process that integrates data source, description, and LLM chat modules. The output is an engaging and accurate LLM generated text.
  • Figure 2: Illustration of various football features including shot location, goalkeeper position, opponent pressure, and teammates' positions.
  • Figure 3: Analysis of two shots from Germany vs. Scotland in EURO 2024. The top row shows the 56th-minute shot, with the pitch visual on the left and the contribution plot on the right. The bottom row shows the 85th-minute shot, with the pitch visual on the left and the contribution plot on the right.
  • Figure 4: Wordalisation workflow for shots
  • Figure 5: Example synthesized text and few-shot example prompt.
  • ...and 3 more figures