SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding
Marc Gutiérrez-Pérez, Antonio Agudo
TL;DR
This work introduces SoccerNet-v3D and ISSIA-3D, two publicly available datasets designed for 3D scene understanding in soccer broadcasts by leveraging field-line-based camera calibration and multi-view synchronization to enable 3D ball localization through triangulation. It proposes a monocular 3D ball localization task grounded in multi-view triangulation, along with calibration and reprojection metrics to assess annotation quality, and a bounding-box optimization method to ensure consistency with the 3D scene. The datasets are built by aligning synchronized main-camera and replay frames (SoccerNet-v3D) and six static synchronized cameras (ISSIA-3D), with robust calibration via PnLCalib and 3D localization validated against ground-truth ball positions. Experimental results show that optimized bounding boxes improve 2D detection and 3D localization accuracy, establishing new benchmarks for monocular 3D ball localization and enabling further research in 3D soccer scene understanding and tracking.
Abstract
Sports video analysis is a key domain in computer vision, enabling detailed spatial understanding through multi-view correspondences. In this work, we introduce SoccerNet-v3D and ISSIA-3D, two enhanced and scalable datasets designed for 3D scene understanding in soccer broadcast analysis. These datasets extend SoccerNet-v3 and ISSIA by incorporating field-line-based camera calibration and multi-view synchronization, enabling 3D object localization through triangulation. We propose a monocular 3D ball localization task built upon the triangulation of ground-truth 2D ball annotations, along with several calibration and reprojection metrics to assess annotation quality on demand. Additionally, we present a single-image 3D ball localization method as a baseline, leveraging camera calibration and ball size priors to estimate the ball's position from a monocular viewpoint. To further refine 2D annotations, we introduce a bounding box optimization technique that ensures alignment with the 3D scene representation. Our proposed datasets establish new benchmarks for 3D soccer scene understanding, enhancing both spatial and temporal analysis in sports analytics. Finally, we provide code to facilitate access to our annotations and the generation pipelines for the datasets.
