RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis
Linfeng Dong, Yuchen Yang, Hao Wu, Wei Wang, Yuenan Hou, Zhihang Zhong, Xiao Sun
TL;DR
RacketVision introduces a large-scale, multi-sport benchmark for unified ball and racket analysis across badminton, tennis, and table tennis, with three interconnected tasks: ball tracking, racket pose estimation, and ball trajectory prediction. The dataset provides pixel-level ball and racket annotations, along with a two-stage annotation pipeline and a three-task training workflow. A central finding is that naive fusion of racket pose features harms trajectory prediction, whereas Cross-Attention fusion enables robust integration of racket cues, improving performance over strong unimodal baselines. The work demonstrates that multi-sport training fosters generalization and establishes a new public resource for dynamic object tracking and multimodal sports analytics.
Abstract
We introduce RacketVision, a novel dataset and benchmark for advancing computer vision in sports analytics, covering table tennis, tennis, and badminton. The dataset is the first to provide large-scale, fine-grained annotations for racket pose alongside traditional ball positions, enabling research into complex human-object interactions. It is designed to tackle three interconnected tasks: fine-grained ball tracking, articulated racket pose estimation, and predictive ball trajectory forecasting. Our evaluation of established baselines reveals a critical insight for multi-modal fusion: while naively concatenating racket pose features degrades performance, a CrossAttention mechanism is essential to unlock their value, leading to trajectory prediction results that surpass strong unimodal baselines. RacketVision provides a versatile resource and a strong starting point for future research in dynamic object tracking, conditional motion forecasting, and multimodal analysis in sports. Project page at https://github.com/OrcustD/RacketVision
