Table of Contents
Fetching ...

Action Anticipation from SoccerNet Football Video Broadcasts

Mohamad Dalal, Artur Xarles, Anthony Cioppa, Silvio Giancola, Marc Van Droogenbroeck, Bernard Ghanem, Albert Clapés, Sergio Escalera, Thomas B. Moeslund

TL;DR

This work defines action anticipation in football broadcasts as predicting future ball-related actions within a fixed anticipation window using only past context. It introduces the SoccerNet Ball Action Anticipation dataset (SN-BAA) and FAANTRA, a transformer-based end-to-end baseline adapted from FUTR, together with new evaluation metrics mAP@$\delta$ and mAP@$\infty$ to capture temporal precision and occurrence. Through comprehensive ablations, the study shows the importance of high spatial resolution and an auxiliary segmentation task, and demonstrates the feasibility yet challenge of accurate anticipation in fast-paced football video. The dataset and code release aim to promote reproducibility and advance predictive analytics for automated broadcasting, tactical analysis, and player decision-support systems.

Abstract

Artificial intelligence has revolutionized the way we analyze sports videos, whether to understand the actions of games in long untrimmed videos or to anticipate the player's motion in future frames. Despite these efforts, little attention has been given to anticipating game actions before they occur. In this work, we introduce the task of action anticipation for football broadcast videos, which consists in predicting future actions in unobserved future frames, within a five- or ten-second anticipation window. To benchmark this task, we release a new dataset, namely the SoccerNet Ball Action Anticipation dataset, based on SoccerNet Ball Action Spotting. Additionally, we propose a Football Action ANticipation TRAnsformer (FAANTRA), a baseline method that adapts FUTR, a state-of-the-art action anticipation model, to predict ball-related actions. To evaluate action anticipation, we introduce new metrics, including mAP@$δ$, which evaluates the temporal precision of predicted future actions, as well as mAP@$\infty$, which evaluates their occurrence within the anticipation window. We also conduct extensive ablation studies to examine the impact of various task settings, input configurations, and model architectures. Experimental results highlight both the feasibility and challenges of action anticipation in football videos, providing valuable insights into the design of predictive models for sports analytics. By forecasting actions before they unfold, our work will enable applications in automated broadcasting, tactical analysis, and player decision-making. Our dataset and code are publicly available at https://github.com/MohamadDalal/FAANTRA.

Action Anticipation from SoccerNet Football Video Broadcasts

TL;DR

This work defines action anticipation in football broadcasts as predicting future ball-related actions within a fixed anticipation window using only past context. It introduces the SoccerNet Ball Action Anticipation dataset (SN-BAA) and FAANTRA, a transformer-based end-to-end baseline adapted from FUTR, together with new evaluation metrics mAP@ and mAP@ to capture temporal precision and occurrence. Through comprehensive ablations, the study shows the importance of high spatial resolution and an auxiliary segmentation task, and demonstrates the feasibility yet challenge of accurate anticipation in fast-paced football video. The dataset and code release aim to promote reproducibility and advance predictive analytics for automated broadcasting, tactical analysis, and player decision-support systems.

Abstract

Artificial intelligence has revolutionized the way we analyze sports videos, whether to understand the actions of games in long untrimmed videos or to anticipate the player's motion in future frames. Despite these efforts, little attention has been given to anticipating game actions before they occur. In this work, we introduce the task of action anticipation for football broadcast videos, which consists in predicting future actions in unobserved future frames, within a five- or ten-second anticipation window. To benchmark this task, we release a new dataset, namely the SoccerNet Ball Action Anticipation dataset, based on SoccerNet Ball Action Spotting. Additionally, we propose a Football Action ANticipation TRAnsformer (FAANTRA), a baseline method that adapts FUTR, a state-of-the-art action anticipation model, to predict ball-related actions. To evaluate action anticipation, we introduce new metrics, including mAP@, which evaluates the temporal precision of predicted future actions, as well as mAP@, which evaluates their occurrence within the anticipation window. We also conduct extensive ablation studies to examine the impact of various task settings, input configurations, and model architectures. Experimental results highlight both the feasibility and challenges of action anticipation in football videos, providing valuable insights into the design of predictive models for sports analytics. By forecasting actions before they unfold, our work will enable applications in automated broadcasting, tactical analysis, and player decision-making. Our dataset and code are publicly available at https://github.com/MohamadDalal/FAANTRA.

Paper Structure

This paper contains 22 sections, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Overview of our new action anticipation task for sports. Action anticipation aims to predict and temporally localize future actions in an anticipation window of $T_a$ seconds using information from a preceding observed context window of $T_c$ seconds. Unlike action spotting, where models can access the entire video sequence to detect actions, action anticipation requires predicting future events without access to future frames.
  • Figure 2: FAANTRA Architecture Overview. FAANTRA processes context video frames by extracting per-frame representations through a backbone (BB). These features are fed into a transformer encoder to capture temporal dependencies. A set of learnable queries, representing action predictions, are initialized in the transformer decoder and refined through multiple layers, leveraging information from the encoder. Each refined query is then processed by a prediction head (PH) to output three components representing the anticipated actions: action detection (i.e., actionness), action class, and temporal position.
  • Figure 3: Anticipation window length analysis: Performance evaluation across different mAP@$\delta$ metrics for varying $T_a$ anticipation windows.
  • Figure 4: Context window length analysis: Performance evaluation across mAP@$\delta$ metrics for varying $T_c$ context windows.
  • Figure 5: Example of a pass action
  • ...and 9 more figures