Table of Contents
Fetching ...

SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic

Yuchen Yang, Wei Wang, Yifei Liu, Linfeng Dong, Hao Wu, Mingxin Zhang, Zhihang Zhong, Xiao Sun

TL;DR

SGA-INTERACT introduces the first large-scale 3D skeleton-based benchmark for group activity understanding in modern basketball, featuring a novel Temporal Group Activity Localization (TGAL) task for untrimmed sequences. The One2Many framework unifies skeleton-based backbones with RGB-style design by using a pretrained individual-action backbone to extract per-player features and a temporal-spatial module (STAtt) to build group representations, enabling evaluation of both GAR and TGAL on the skeleton data. The dataset provides high-quality 3D skeletons from multi-view motion capture, 18 tactical categories with long temporal dependencies, and precise temporal boundaries, accompanied by extensive ablations showing the value of 3D data, extra information fusion, and pretrained backbones. Overall, SGA-INTERACT and One2Many establish a demanding benchmark that pushes advances in spatial-temporal modeling for group activities in sports and offers a bridge between RGB- and skeleton-based methodologies with practical implications for robust, view-invariant understanding.

Abstract

Group Activity Understanding is predominantly studied as Group Activity Recognition (GAR) task. However, existing GAR benchmarks suffer from coarse-grained activity vocabularies and the only data form in single-view, which hinder the evaluation of state-of-the-art algorithms. To address these limitations, we introduce SGA-INTERACT, the first 3D skeleton-based benchmark for group activity understanding. It features complex activities inspired by basketball tactics, emphasizing rich spatial interactions and long-term dependencies. SGA-INTERACT introduces Temporal Group Activity Localization (TGAL) task, extending group activity understanding to untrimmed sequences, filling the gap left by GAR as a standalone task. In addition to the benchmark, we propose One2Many, a novel framework that employs a pretrained 3D skeleton backbone for unified individual feature extraction. This framework aligns with the feature extraction paradigm in RGB-based methods, enabling direct evaluation of RGB-based models on skeleton-based benchmarks. We conduct extensive evaluations on SGA-INTERACT using two skeleton-based methods, three RGB-based methods, and a proposed baseline within the One2Many framework. The general low performance of baselines highlights the benchmark's challenges, motivating advancements in group activity understanding.

SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic

TL;DR

SGA-INTERACT introduces the first large-scale 3D skeleton-based benchmark for group activity understanding in modern basketball, featuring a novel Temporal Group Activity Localization (TGAL) task for untrimmed sequences. The One2Many framework unifies skeleton-based backbones with RGB-style design by using a pretrained individual-action backbone to extract per-player features and a temporal-spatial module (STAtt) to build group representations, enabling evaluation of both GAR and TGAL on the skeleton data. The dataset provides high-quality 3D skeletons from multi-view motion capture, 18 tactical categories with long temporal dependencies, and precise temporal boundaries, accompanied by extensive ablations showing the value of 3D data, extra information fusion, and pretrained backbones. Overall, SGA-INTERACT and One2Many establish a demanding benchmark that pushes advances in spatial-temporal modeling for group activities in sports and offers a bridge between RGB- and skeleton-based methodologies with practical implications for robust, view-invariant understanding.

Abstract

Group Activity Understanding is predominantly studied as Group Activity Recognition (GAR) task. However, existing GAR benchmarks suffer from coarse-grained activity vocabularies and the only data form in single-view, which hinder the evaluation of state-of-the-art algorithms. To address these limitations, we introduce SGA-INTERACT, the first 3D skeleton-based benchmark for group activity understanding. It features complex activities inspired by basketball tactics, emphasizing rich spatial interactions and long-term dependencies. SGA-INTERACT introduces Temporal Group Activity Localization (TGAL) task, extending group activity understanding to untrimmed sequences, filling the gap left by GAR as a standalone task. In addition to the benchmark, we propose One2Many, a novel framework that employs a pretrained 3D skeleton backbone for unified individual feature extraction. This framework aligns with the feature extraction paradigm in RGB-based methods, enabling direct evaluation of RGB-based models on skeleton-based benchmarks. We conduct extensive evaluations on SGA-INTERACT using two skeleton-based methods, three RGB-based methods, and a proposed baseline within the One2Many framework. The general low performance of baselines highlights the benchmark's challenges, motivating advancements in group activity understanding.

Paper Structure

This paper contains 35 sections, 10 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: SGA-INTERACT overview. SGA-INTERACT captures 3D skeleton sequences of game rounds and annotates tactical movements within each round. Featuring rich interactions and long-term dependencies, it contains challenging activities with high-level semantics. With clearly defined activity boundaries, it supports both GAR and TGAL tasks. Teams are distinguished by cool/warm colors.
  • Figure 2: Overview of prevailing GAR datasets. Without much spatial and temporal dependencies, group activity can be recognized through a small number of individual actions in key frames.
  • Figure 3: Overview of SGA-INTERACT annotation pipeline.
  • Figure 4: Statistics of SGA-INTERACT dataset for GAR and TGAL tasks.
  • Figure 5: Overview of One2Many framework.
  • ...and 5 more figures