Table of Contents
Fetching ...

RisingBALLER: A player is a token, a match is a sentence, A path towards a foundational model for football players data analytics

Akedjou Achraff Adjileye

TL;DR

RisingBALLER is a comprehensive framework designed to transform football data analytics by learning high-level foundational features for players, taking into account the context of each match, and offers a deeper understanding of football players beyond individual statistics.

Abstract

In this paper, I introduce RisingBALLER, the first publicly available approach that leverages a transformer model trained on football match data to learn match-specific player representations. Drawing inspiration from advances in language modeling, RisingBALLER treats each football match as a unique sequence in which players serve as tokens, with their embeddings shaped by the specific context of the match. Through the use of masked player prediction (MPP) as a pre-training task, RisingBALLER learns foundational features for football player representations, similar to how language models learn semantic features for text representations. As a downstream task, I introduce next match statistics prediction (NMSP) to showcase the effectiveness of the learned player embeddings. The NMSP model surpasses a strong baseline commonly used for performance forecasting within the community. Furthermore, I conduct an in-depth analysis to demonstrate how the learned embeddings by RisingBALLER can be used in various football analytics tasks, such as producing meaningful positional features that capture the essence and variety of player roles beyond rigid x,y coordinates, team cohesion estimation, and similar player retrieval for more effective data-driven scouting. More than a simple machine learning model, RisingBALLER is a comprehensive framework designed to transform football data analytics by learning high-level foundational features for players, taking into account the context of each match. It offers a deeper understanding of football players beyond individual statistics.

RisingBALLER: A player is a token, a match is a sentence, A path towards a foundational model for football players data analytics

TL;DR

RisingBALLER is a comprehensive framework designed to transform football data analytics by learning high-level foundational features for players, taking into account the context of each match, and offers a deeper understanding of football players beyond individual statistics.

Abstract

In this paper, I introduce RisingBALLER, the first publicly available approach that leverages a transformer model trained on football match data to learn match-specific player representations. Drawing inspiration from advances in language modeling, RisingBALLER treats each football match as a unique sequence in which players serve as tokens, with their embeddings shaped by the specific context of the match. Through the use of masked player prediction (MPP) as a pre-training task, RisingBALLER learns foundational features for football player representations, similar to how language models learn semantic features for text representations. As a downstream task, I introduce next match statistics prediction (NMSP) to showcase the effectiveness of the learned player embeddings. The NMSP model surpasses a strong baseline commonly used for performance forecasting within the community. Furthermore, I conduct an in-depth analysis to demonstrate how the learned embeddings by RisingBALLER can be used in various football analytics tasks, such as producing meaningful positional features that capture the essence and variety of player roles beyond rigid x,y coordinates, team cohesion estimation, and similar player retrieval for more effective data-driven scouting. More than a simple machine learning model, RisingBALLER is a comprehensive framework designed to transform football data analytics by learning high-level foundational features for players, taking into account the context of each match. It offers a deeper understanding of football players beyond individual statistics.
Paper Structure (25 sections, 6 figures, 7 tables)

This paper contains 25 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Architecture of RisingBALLER—each player in the match dataset is treated as a token, with a unique ID embedded into a D-dimensional feature vector (PE). This ID is combined with additional embeddings representing the player's spatial position on the field (SPE) and team affiliation (TE). The player's match event data is also projected into the same D-dimensional space, serving as temporal positional embeddings (TPE). These four vectors are then combined to initialize the player representation before being fed into the attention network.
  • Figure 2: Left: Positional embeddings clustered into 2 groups. Right: Positional embeddings clustered into 3 groups.
  • Figure 3: Validation curves of the two architectures on NMSP, with and without MPP pretraining, fs means from scratch and ft means fine tuned. The oval circles denote the areas of convergence with the minimal losses.
  • Figure 4:
  • Figure 5: Dissimilarity heatmap with players affiliation embeddings, the players embeddings are compared using cosine similarity, the players embeddings used are from the best performing model on MPP in that setup, 1l64d (scores in Table \ref{['tab: 1']}).
  • ...and 1 more figures