Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Miru Hong; Minho Lee; Geonhee Jo; Hyeokje Jo; Pascal Bauer; Sang-Ki Ko

Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Miru Hong, Minho Lee, Geonhee Jo, Hyeokje Jo, Pascal Bauer, Sang-Ki Ko

Abstract

Evaluating football player transfers is challenging because player actions depend strongly on tactical systems, teammates, and match context. Despite this complexity, recruitment decisions often rely on static statistics and subjective expert judgment, which do not fully account for these contextual factors. This limitation stems largely from the absence of counterfactual simulation mechanisms capable of predicting outcomes in hypothetical scenarios. To address these challenges, we propose ScoutGPT, a generative model that treats football match events as sequential tokens within a language modeling framework. Utilizing a NanoGPT-based Transformer architecture trained on next-token prediction, ScoutGPT learns the dynamics of match event sequences to simulate event sequences under hypothetical lineups, demonstrating superior predictive performance compared to existing baseline models. Leveraging this capability, the model employs Monte Carlo sampling to enable counterfactual simulation, allowing for the assessment of unobserved scenarios. Experiments on K League data show that simulated player transfers lead to measurable changes in offensive progression and goal probabilities, indicating that ScoutGPT captures player-specific impact beyond traditional static metrics.

Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Abstract

Paper Structure (36 sections, 11 equations, 3 figures, 9 tables)

This paper contains 36 sections, 11 equations, 3 figures, 9 tables.

Introduction
Related Work
Data-Driven Player Valuation
Generative Modeling of Sports Data
Sequence Modeling for Event Streams
Counterfactual Simulation in Sports
Methodology
Data Representation and Verification
Problem Formulation
Structured Event Tokenization
Context Encoding
Event Encoding
ScoutGPT Architecture
Backbone
Auxiliary Heads for Value Estimation
...and 21 more sections

Figures (3)

Figure 1: Overview of the ScoutGPT framework. Our nanoGPT-based Transformer model autoregressively predicts event tokens, enabling counterfactual 'what-if' simulations. For instance, replacing Kevin De Bruyne with Scott McTominay could alter actions (e.g., pass/shot) or modify the same action with a different location, outcome, or VAEP.
Figure 2: Comparison of the mean absolute delta under different numbers of samples. The left shows the per-episode mean absolute delta, while the right shows the cumulative mean absolute delta.
Figure 3: $t$-SNE projection of ScoutGPT player embeddings from the 2024 K League season, colored by positional role.

Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Abstract

Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Authors

Abstract

Table of Contents

Figures (3)