Automated Explanation of Machine Learning Models of Footballing Actions in Words
Pegah Rahimian, Jernej Flisar, David Sumpter
TL;DR
This work addresses the gap between machine-learning predictions for football and coaching staff explanations. It combines a logistic-regression $xG$ model with a Wordalisation layer that converts feature contributions into natural-language narratives via LLM prompts, using percentile-based categorization for $xG$ and its features. In practice, the model uses $P(y=1|x) = 1/(1+e^{-log-odds})$ with $log-odds = beta0 + \sum_j beta_j x_j$, and contributions are $Contribution_j = beta_j * (x_j - mu_j)$. The authors provide a model card and an open-source Streamlit app to demonstrate explanations and discuss extensions to other football actions and real-world coaching use.
Abstract
While football analytics has changed the way teams and analysts assess performance, there remains a communication gap between machine learning practice and how coaching staff talk about football. Coaches and practitioners require actionable insights, which are not always provided by models. To bridge this gap, we show how to build wordalizations (a novel approach that leverages large language models) for shots in football. Specifically, we first build an expected goals model using logistic regression. We then use the co-efficients of this regression model to write sentences describing how factors (such as distance, angle and defensive pressure) contribute to the model's prediction. Finally, we use large language models to give an entertaining description of the shot. We describe our approach in a model card and provide an interactive open-source application describing shots in recent tournaments. We discuss how shot wordalisations might aid communication in coaching and football commentary, and give a further example of how the same approach can be applied to other actions in football.
