From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction
Vladimir Golovkin, Nikolay Nemtsev, Vasyl Shandyba, Oleg Udin, Nikita Kasatkin, Pavel Kononov, Anton Afanasiev, Sergey Ulasen, Andrei Boiarov
TL;DR
The paper tackles the problem of reconstructing accurate game-state information from single-camera football broadcasts, including player positions, roles, teams, and jersey numbers, and presents a modular pipeline to output world-coordinate trajectories suitable for minimap representations. It fuses a fine-tuned object detector (YOLOv5m), a SegFormer–based camera parameter estimator with Field Keypoints refinement, and a DeepSORT-based tracker augmented with ReID, TeamID, and jersey-number recognition, followed by a multi-stage post-processing step to merge fragmented tracklets. Key contributions include the SegFormer-based camera parameter regression with keypoint-based refinement, a five-cluster team-detection scheme, a robust post-processing pipeline that significantly reduces tracklet fragmentation, and real-time performance on consumer hardware, culminating in a GS-HOTA score of 63.81 and first place in SoccerNet GSR 2024. The work demonstrates strong gains from integrated detection, localization, and identity modeling, enabling reliable minimap-based game state reconstruction with practical implications for coaching analytics and tactical decision-making in football.
Abstract
Game State Reconstruction (GSR), a critical task in Sports Video Understanding, involves precise tracking and localization of all individuals on the football field-players, goalkeepers, referees, and others - in real-world coordinates. This capability enables coaches and analysts to derive actionable insights into player movements, team formations, and game dynamics, ultimately optimizing training strategies and enhancing competitive advantage. Achieving accurate GSR using a single-camera setup is highly challenging due to frequent camera movements, occlusions, and dynamic scene content. In this work, we present a robust end-to-end pipeline for tracking players across an entire match using a single-camera setup. Our solution integrates a fine-tuned YOLOv5m for object detection, a SegFormer-based camera parameter estimator, and a DeepSORT-based tracking framework enhanced with re-identification, orientation prediction, and jersey number recognition. By ensuring both spatial accuracy and temporal consistency, our method delivers state-of-the-art game state reconstruction, securing first place in the SoccerNet Game State Reconstruction Challenge 2024 and significantly outperforming competing methods.
