SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic
Yuchen Yang, Wei Wang, Yifei Liu, Linfeng Dong, Hao Wu, Mingxin Zhang, Zhihang Zhong, Xiao Sun
TL;DR
SGA-INTERACT introduces the first large-scale 3D skeleton-based benchmark for group activity understanding in modern basketball, featuring a novel Temporal Group Activity Localization (TGAL) task for untrimmed sequences. The One2Many framework unifies skeleton-based backbones with RGB-style design by using a pretrained individual-action backbone to extract per-player features and a temporal-spatial module (STAtt) to build group representations, enabling evaluation of both GAR and TGAL on the skeleton data. The dataset provides high-quality 3D skeletons from multi-view motion capture, 18 tactical categories with long temporal dependencies, and precise temporal boundaries, accompanied by extensive ablations showing the value of 3D data, extra information fusion, and pretrained backbones. Overall, SGA-INTERACT and One2Many establish a demanding benchmark that pushes advances in spatial-temporal modeling for group activities in sports and offers a bridge between RGB- and skeleton-based methodologies with practical implications for robust, view-invariant understanding.
Abstract
Group Activity Understanding is predominantly studied as Group Activity Recognition (GAR) task. However, existing GAR benchmarks suffer from coarse-grained activity vocabularies and the only data form in single-view, which hinder the evaluation of state-of-the-art algorithms. To address these limitations, we introduce SGA-INTERACT, the first 3D skeleton-based benchmark for group activity understanding. It features complex activities inspired by basketball tactics, emphasizing rich spatial interactions and long-term dependencies. SGA-INTERACT introduces Temporal Group Activity Localization (TGAL) task, extending group activity understanding to untrimmed sequences, filling the gap left by GAR as a standalone task. In addition to the benchmark, we propose One2Many, a novel framework that employs a pretrained 3D skeleton backbone for unified individual feature extraction. This framework aligns with the feature extraction paradigm in RGB-based methods, enabling direct evaluation of RGB-based models on skeleton-based benchmarks. We conduct extensive evaluations on SGA-INTERACT using two skeleton-based methods, three RGB-based methods, and a proposed baseline within the One2Many framework. The general low performance of baselines highlights the benchmark's challenges, motivating advancements in group activity understanding.
