Table of Contents
Fetching ...

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

Vladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, Baptiste Standaert, Amir Mohammad Mansourian, Xin Zhou, Shohreh Kasaei, Bernard Ghanem, Alexandre Alahi, Marc Van Droogenbroeck, Christophe De Vleeschouwer

TL;DR

The paper defines Game State Reconstruction (GSR) as jointly localizing and identifying all athletes on a minimap from a single broadcast video. It introduces SoccerNet-GSR, the first open dataset for GSR, along with GS-HOTA, a metric that combines localization and identification accuracy, and an end-to-end GSR baseline that integrates detection, re-identification, jersey-number recognition, team affiliation, and pitch calibration. The results demonstrate the task’s difficulty and identify calibration and jersey-number recognition as key bottlenecks, while highlighting the dataset and metric as valuable benchmarks for future research. The work enables a unified, interpretable representation of game state that can power analytics, coaching, and broadcast applications across football and other team sports.

Abstract

Tracking and identifying athletes on the pitch holds a central role in collecting essential insights from the game, such as estimating the total distance covered by players or understanding team tactics. This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i.e. a minimap). However, reconstructing the game state from videos captured by a single camera is challenging. It requires understanding the position of the athletes and the viewpoint of the camera to localize and identify players within the field. In this work, we formalize the task of Game State Reconstruction and introduce SoccerNet-GSR, a novel Game State Reconstruction dataset focusing on football videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number. Furthermore, we introduce GS-HOTA, a novel metric to evaluate game state reconstruction methods. Finally, we propose and release an end-to-end baseline for game state reconstruction, bootstrapping the research on this task. Our experiments show that GSR is a challenging novel task, which opens the field for future research. Our dataset and codebase are publicly available at https://github.com/SoccerNet/sn-gamestate.

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

TL;DR

The paper defines Game State Reconstruction (GSR) as jointly localizing and identifying all athletes on a minimap from a single broadcast video. It introduces SoccerNet-GSR, the first open dataset for GSR, along with GS-HOTA, a metric that combines localization and identification accuracy, and an end-to-end GSR baseline that integrates detection, re-identification, jersey-number recognition, team affiliation, and pitch calibration. The results demonstrate the task’s difficulty and identify calibration and jersey-number recognition as key bottlenecks, while highlighting the dataset and metric as valuable benchmarks for future research. The work enables a unified, interpretable representation of game state that can power analytics, coaching, and broadcast applications across football and other team sports.

Abstract

Tracking and identifying athletes on the pitch holds a central role in collecting essential insights from the game, such as estimating the total distance covered by players or understanding team tactics. This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i.e. a minimap). However, reconstructing the game state from videos captured by a single camera is challenging. It requires understanding the position of the athletes and the viewpoint of the camera to localize and identify players within the field. In this work, we formalize the task of Game State Reconstruction and introduce SoccerNet-GSR, a novel Game State Reconstruction dataset focusing on football videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number. Furthermore, we introduce GS-HOTA, a novel metric to evaluate game state reconstruction methods. Finally, we propose and release an end-to-end baseline for game state reconstruction, bootstrapping the research on this task. Our experiments show that GSR is a challenging novel task, which opens the field for future research. Our dataset and codebase are publicly available at https://github.com/SoccerNet/sn-gamestate.
Paper Structure (22 sections, 5 equations, 8 figures, 3 tables)

This paper contains 22 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: SoccerNet-GSR. We introduce a novel Game State Reconstruction task, dataset, evaluation metric and baseline. Our SoccerNet-GSR dataset contains unique identifications for players along with their localization on the pitch, for $200$ video sequences.
  • Figure 2: The localization similarity function for $\tau = 5$ meters.
  • Figure 3: Architecture overview of our proposed baseline. GSR-Baseline takes a video as input and outputs the complete game state. Two modules are first applied on the input images: an object detector and a pitch localization model. Then, PRTreID Mansourian2023Multitask produces a ReID embedding for each detection, that is identity, team, and role aware. These embeddings are then forwarded to subsequent modules to perform role classification, team affiliation, and multi-object tracking. Finally, the pitch localization output is used for camera calibration, which enables the tracked bounding boxes to be transformed into 2D positions on the pitch coordinate system.
  • Figure 4: Distance Tolerance Parameter $\tau$: its influence on the GS-HOTA score. We pick $\tau$=5, illustrated by the orange line.
  • Figure 5: Qualitative results. Output predictions of two frames from videos with different GS-HOTA values. (Top) High GS-HOTA (49.69%), with robust pitch localization and accurate athlete identification. (Bottom) Calibration failure (e.g. due to insufficient pitch elements) leads to completely erroneous athlete localization and poor GS-HOTA (0.23%).
  • ...and 3 more figures