Hand Held Multi-Object Tracking Dataset in American Football
Rintaro Otsubo, Kanta Sawafuji, Hideo Saito
TL;DR
This work tackles the lack of a standardized MOT benchmark for American football by introducing the first two-view, hand-held camera dataset with roughly 200k bounding boxes across over 10k frames. It evaluates a range of detection and tracking approaches, showing that fine-tuning detectors and ReID models leads to substantial improvements in both detection and tracking accuracy, with BoT-SORT plus a fine-tuned detector delivering the best overall MOT performance. The study highlights the pivotal role of accurate detection in occlusion-heavy football scenes and demonstrates that OSNet can outperform transformer-based ReID in this domain. Overall, the dataset and experimental findings provide a valuable resource for device-free, vision-based analysis of football games and a benchmark for cross-domain MOT research.
Abstract
Multi-Object Tracking (MOT) plays a critical role in analyzing player behavior from videos, enabling performance evaluation. Current MOT methods are often evaluated using publicly available datasets. However, most of these focus on everyday scenarios such as pedestrian tracking or are tailored to specific sports, including soccer and basketball. Despite the inherent challenges of tracking players in American football, such as frequent occlusion and physical contact, no standardized dataset has been publicly available, making fair comparisons between methods difficult. To address this gap, we constructed the first dedicated detection and tracking dataset for the American football players and conducted a comparative evaluation of various detection and tracking methods. Our results demonstrate that accurate detection and tracking can be achieved even in crowded scenarios. Fine-tuning detection models improved performance over pre-trained models. Furthermore, when these fine-tuned detectors and re-identification models were integrated into tracking systems, we observed notable improvements in tracking accuracy compared to existing approaches. This work thus enables robust detection and tracking of American football players in challenging, high-density scenarios previously underserved by conventional methods.
