A Foundation Model for Soccer

Ethan Baron; Daniel Hocevar; Zach Salehe

A Foundation Model for Soccer

Ethan Baron, Daniel Hocevar, Zach Salehe

TL;DR

This work introduces a decoder-only transformer foundation model for soccer action sequences, trained to predict the next action from past actions using play-by-play data from three FA Women's Super League seasons. Actions are discretized into a token space via SPADL on a 100-bin field grid, and the model is trained with cross-entropy using action embeddings; two model sizes are evaluated against Markov and MLP baselines. The large transformer provides the best predictive accuracy and probability calibration, with scaling analyses suggesting more data helps but very large context windows yield diminishing gains. Visualizations reveal meaningful action-type and spatial clustering in the learned embeddings, and qualitative examples illustrate both successes and typical failure modes in soccer action prediction. The approach enables sequence generation, downstream predictive analytics, and richer player/team representations in soccer analytics.

Abstract

We propose a foundation model for soccer, which is able to predict subsequent actions in a soccer match from a given input sequence of actions. As a proof of concept, we train a transformer architecture on three seasons of data from a professional soccer league. We quantitatively and qualitatively compare the performance of this transformer architecture to two baseline models: a Markov model and a multi-layer perceptron. Additionally, we discuss potential applications of our model. We provide an open-source implementation of our methods at https://github.com/danielhocevar/Foundation-Model-for-Soccer.

A Foundation Model for Soccer

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 6 figures, 2 tables)

This paper contains 15 sections, 3 equations, 6 figures, 2 tables.

Introduction
Background
Deep Learning for Sequence Modeling
Modeling Soccer Actions
Methods
Dataset
Neural Network Architecture
Baseline Models
Results
Quantitative Results
Transformer Scaling Laws
Visualizing Embeddings
Example Outputs
Potential Applications
Conclusion

Figures (6)

Figure 1: Model architecture diagram
Figure 2: Plots showing how the validation accuracy of the model varies depending on dataset size, context size and number of parameters
Figure 3: Visualizations of individual play embeddings
Figure 4: Sequence where models fail
Figure 5: Another sequence where models fail
...and 1 more figures

A Foundation Model for Soccer

TL;DR

Abstract

A Foundation Model for Soccer

Authors

TL;DR

Abstract

Table of Contents

Figures (6)