See It Before You Grab It: Deep Learning-based Action Anticipation in Basketball
Arnau Barrera Roy, Albert Clapés Sintes
TL;DR
This paper introduces action anticipation in basketball by predicting which team will secure possession after a missed shot, leveraging a self-curated NBA Rebounds dataset with over 100,000 videos and 2,000 timestamp-annotated rebounds. It proposes a Transformer-Encoder Anticipation Model (TEAM) built on an X3D_m backbone to handle online anticipation, and compares it to a strong baseline, while exploring auxiliary tasks like action classification and action spotting. The study demonstrates the feasibility and challenges of anticipating rebounds, provides extensive offline and online experiments, and analyzes human versus AI performance, interpretability, and data-augmentation strategies. The results offer insights into predictive modeling for dynamic multi-agent sports and pave the way for real-time broadcasting tools and post-game analysis, while highlighting avenues for future improvements such as ball-tracking cues and uncertainty-aware heads.
Abstract
Computer vision and video understanding have transformed sports analytics by enabling large-scale, automated analysis of game dynamics from broadcast footage. Despite significant advances in player and ball tracking, pose estimation, action localization, and automatic foul recognition, anticipating actions before they occur in sports videos has received comparatively little attention. This work introduces the task of action anticipation in basketball broadcast videos, focusing on predicting which team will gain possession of the ball following a shot attempt. To benchmark this task, a new self-curated dataset comprising 100,000 basketball video clips, over 300 hours of footage, and more than 2,000 manually annotated rebound events is presented. Comprehensive baseline results are reported using state-of-the-art action anticipation methods, representing the first application of deep learning techniques to basketball rebound prediction. Additionally, two complementary tasks, rebound classification and rebound spotting, are explored, demonstrating that this dataset supports a wide range of video understanding applications in basketball, for which no comparable datasets currently exist. Experimental results highlight both the feasibility and inherent challenges of anticipating rebounds, providing valuable insights into predictive modeling for dynamic multi-agent sports scenarios. By forecasting team possession before rebounds occur, this work enables applications in real-time automated broadcasting and post-game analysis tools to support decision-making.
