Table of Contents
Fetching ...

FootBots: A Transformer-based Architecture for Motion Prediction in Soccer

Guillem Capellera, Luis Ferraz, Antonio Rubio, Antonio Agudo, Francesc Moreno-Noguer

TL;DR

FootBots, an encoder-decoder transformer-based architecture addressing motion prediction and conditioned motion prediction through equivariance properties is presented, demonstrating that FootBots outperforms baselines in motion prediction and excels in conditioned tasks.

Abstract

Motion prediction in soccer involves capturing complex dynamics from player and ball interactions. We present FootBots, an encoder-decoder transformer-based architecture addressing motion prediction and conditioned motion prediction through equivariance properties. FootBots captures temporal and social dynamics using set attention blocks and multi-attention block decoder. Our evaluation utilizes two datasets: a real soccer dataset and a tailored synthetic one. Insights from the synthetic dataset highlight the effectiveness of FootBots' social attention mechanism and the significance of conditioned motion prediction. Empirical results on real soccer data demonstrate that FootBots outperforms baselines in motion prediction and excels in conditioned tasks, such as predicting the players based on the ball position, predicting the offensive (defensive) team based on the ball and the defensive (offensive) team, and predicting the ball position based on all players. Our evaluation connects quantitative and qualitative findings. https://youtu.be/9kaEkfzG3L8

FootBots: A Transformer-based Architecture for Motion Prediction in Soccer

TL;DR

FootBots, an encoder-decoder transformer-based architecture addressing motion prediction and conditioned motion prediction through equivariance properties is presented, demonstrating that FootBots outperforms baselines in motion prediction and excels in conditioned tasks.

Abstract

Motion prediction in soccer involves capturing complex dynamics from player and ball interactions. We present FootBots, an encoder-decoder transformer-based architecture addressing motion prediction and conditioned motion prediction through equivariance properties. FootBots captures temporal and social dynamics using set attention blocks and multi-attention block decoder. Our evaluation utilizes two datasets: a real soccer dataset and a tailored synthetic one. Insights from the synthetic dataset highlight the effectiveness of FootBots' social attention mechanism and the significance of conditioned motion prediction. Empirical results on real soccer data demonstrate that FootBots outperforms baselines in motion prediction and excels in conditioned tasks, such as predicting the players based on the ball position, predicting the offensive (defensive) team based on the ball and the defensive (offensive) team, and predicting the ball position based on all players. Our evaluation connects quantitative and qualitative findings. https://youtu.be/9kaEkfzG3L8
Paper Structure (12 sections, 9 equations, 4 figures, 2 tables)

This paper contains 12 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Motion prediction in soccer. The method predicts both player and ball motions from partial 2D trajectories under specified conditions. In the figure, squares represent the end positions of ground truth offensive and defensive team players, crosses denote their predicted positions, and circles indicate the final ball ones. Five different tasks (MP, CMP$_{1-4}$) for the same test sequence are displayed, everyone of them is tailored to predict specific subsets of agents, as specified in parentheses.
  • Figure 2: FootBots architecture in soccer. FootBots exploits an encoder-decoder structure with sequential temporal and social attention mechanisms. It incorporates Set Attention Blocks to encode temporal SAB$_T$ and social SAB$_S$ dynamics represented in the context $\mathcal{C}$. The Multi-Attention Block Decoder in the temporal axis (MABD$_T$) and SAB$_S$ in the decoder generate the predicted trajectories. FootBots is capable of solving both MP and CMP tasks in soccer, with an input of the decoder $\mathcal{H}$ varying depending on the task.
  • Figure 3: Two examples from the synthetic dataset. The examples serve to visually compare the performance of FootBots NS and FootBots solving MP task, and FootBots solving CMP$_1$. The predictions for different player types (S, L, and A) are evaluated, emphasizing the impact of incorporating social attention and the ball as the conditioning agent.
  • Figure 4: Qualitative evaluation and comparison on real data. The figure displays the estimated trajectories for approaches Velocity, RNN becker2018red, baller2vec++ alcorn2021baller2vec++ and siMLPe guo2023back; and our solutions FootBots NS and FootBots, by solving the MP task.