YO-CSA-T: A Real-time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention
Yuan Lai, Zhiwei Shi, Chengxi Zhu
TL;DR
This work tackles the problem of real-time 3D shuttlecock trajectory tracking for badminton robotics. It introduces YO-CSA-T, a YOLOv8s-based detector enhanced with a Contextual Transformer Block (CoT2f) and a Spatial Attention-Integrated Neck (SANeck), plus a decoupled head, to robustly detect the small, fast shuttlecock. The system maps 2D detections to 3D coordinates $(x,y,z)$ via stereo vision, predicts future positions, and uses a compensation module to interpolate missing frames, achieving 90.43% mAP@0.75 and real-time performance (>130 fps) on a dataset of 32,539 images. The approach enables accurate, real-time 3D trajectory extraction, with implications for robotic control, match analysis, and automated coaching in badminton.
Abstract
The 3D trajectory of a shuttlecock required for a badminton rally robot for human-robot competition demands real-time performance with high accuracy. However, the fast flight speed of the shuttlecock, along with various visual effects, and its tendency to blend with environmental elements, such as court lines and lighting, present challenges for rapid and accurate 2D detection. In this paper, we first propose the YO-CSA detection network, which optimizes and reconfigures the YOLOv8s model's backbone, neck, and head by incorporating contextual and spatial attention mechanisms to enhance model's ability in extracting and integrating both global and local features. Next, we integrate three major subtasks, detection, prediction, and compensation, into a real-time 3D shuttlecock trajectory detection system. Specifically, our system maps the 2D coordinate sequence extracted by YO-CSA into 3D space using stereo vision, then predicts the future 3D coordinates based on historical information, and re-projects them onto the left and right views to update the position constraints for 2D detection. Additionally, our system includes a compensation module to fill in missing intermediate frames, ensuring a more complete trajectory. We conduct extensive experiments on our own dataset to evaluate both YO-CSA's performance and system effectiveness. Experimental results show that YO-CSA achieves a high accuracy of 90.43% mAP@0.75, surpassing both YOLOv8s and YOLO11s. Our system performs excellently, maintaining a speed of over 130 fps across 12 test sequences.
