Forecasting Events in Soccer Matches Through Language
Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira
TL;DR
The paper tackles the problem of forecasting the next event in soccer by treating matches as sequences of events and drawing inspiration from Large Language Models to build a single, language-based Large Events Model (LEM). It leverages the public WyScout dataset with ordinal-encoded event tokens and a token-by-token prediction paradigm, enabling end-to-end generation of entire event chains and scalable simulations for analytics pipelines. Experimental results show meaningful gains in predicting the next event type and improving spatial accuracy, while also enabling situational xG maps, momentum-like short-term probabilities, and long-term match outcome forecasts, with VAEP valuations broadly aligning to expected scoring opportunities. The work offers a scalable backbone for diverse analytics tasks in soccer and lays out clear avenues for future enhancements, such as richer contextual inputs and more advanced architectures to push predictive performance further.
Abstract
This paper introduces an approach to predicting the next event in a soccer match, a challenge bearing remarkable similarities to the problem faced by Large Language Models (LLMs). Unlike other methods that severely limit event dynamics in soccer, often abstracting from many variables or relying on a mix of sequential models, our research proposes a novel technique inspired by the methodologies used in LLMs. These models predict a complete chain of variables that compose an event, significantly simplifying the construction of Large Event Models (LEMs) for soccer. Utilizing deep learning on the publicly available WyScout dataset, the proposed approach notably surpasses the performance of previous LEM proposals in critical areas, such as the prediction accuracy of the next event type. This paper highlights the utility of LEMs in various applications, including match prediction and analytics. Moreover, we show that LEMs provide a simulation backbone for users to build many analytics pipelines, an approach opposite to the current specialized single-purpose models. LEMs represent a pivotal advancement in soccer analytics, establishing a foundational framework for multifaceted analytics pipelines through a singular machine-learning model.
