SoccerSynth-Detection: A Synthetic Dataset for Soccer Player Detection
Haobin Qin, Calvin Yeung, Rikuhei Umemoto, Keisuke Fujii
TL;DR
The paper tackles the limited availability of diverse soccer player detection data caused by occlusion and copyright constraints by introducing SoccerSynth-Detection, a synthetic dataset generated via an Unreal Engine-based simulator with extensive domain randomization and motion blur. It validates the approach with YOLOv8n, showing that models trained on synthetic data generalize well to real-world datasets and particularly excel under motion blur; pretraining on the synthetic data boosts performance in low-data scenarios. Key contributions include the dataset, the simulator enhancements, and the demonstration that synthetic data can replace or augment real data for detection training, aided by a publicly available dataset generator. The findings suggest synthetic data can substantially mitigate data scarcity in sports video analysis and accelerate development of robust detection methods. Overall, SoccerSynth-Detection presents a practical path toward scalable, labeled data for soccer analytics with strong implications for transfer learning and domain adaptation in sports AI.
Abstract
In soccer video analysis, player detection is essential for identifying key events and reconstructing tactical positions. The presence of numerous players and frequent occlusions, combined with copyright restrictions, severely restricts the availability of datasets, leaving limited options such as SoccerNet-Tracking and SportsMOT. These datasets suffer from a lack of diversity, which hinders algorithms from adapting effectively to varied soccer video contexts. To address these challenges, we developed SoccerSynth-Detection, the first synthetic dataset designed for the detection of synthetic soccer players. It includes a broad range of random lighting and textures, as well as simulated camera motion blur. We validated its efficacy using the object detection model (Yolov8n) against real-world datasets (SoccerNet-Tracking and SportsMoT). In transfer tests, it matched the performance of real datasets and significantly outperformed them in images with motion blur; in pre-training tests, it demonstrated its efficacy as a pre-training dataset, significantly enhancing the algorithm's overall performance. Our work demonstrates the potential of synthetic datasets to replace real datasets for algorithm training in the field of soccer video analysis.
