Table of Contents
Fetching ...

Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset

Ruixu Zhang, Yuran Wang, Xinyi Hu, Chaoyu Mai, Wenxuan Liu, Danni Xu, Xian Zhong, Zheng Wang

TL;DR

This work tackles forecasting group-level intentions, a step beyond traditional individual-intention recognition, by defining the GIF task and introducing SHOT, a large-scale, multi-view dataset with rich per-player annotations. The authors propose GIFT, a spatio-temporal encoder-decoder framework that models evolving inter-player dynamics to forecast when a group intention occurs, quantified as the frame $f_\tau$ within a clip of length $T$. Experiments show SHOT's utility and that GIFT outperforms traditional temporal action localization baselines in timing accuracy (MAE), while highlighting the challenge of early-stage forecasting (lower F1) due to limited initial cues. The dataset and baseline provide a foundation for future research in group intention forecasting with broad implications for sports analytics, safety, and intelligent systems, enabling timely interventions based on emergent collective goals.

Abstract

Intention recognition has traditionally focused on individual intentions, overlooking the complexities of collective intentions in group settings. To address this limitation, we introduce the concept of group intention, which represents shared goals emerging through the actions of multiple individuals, and Group Intention Forecasting (GIF), a novel task that forecasts when group intentions will occur by analyzing individual actions and interactions before the collective goal becomes apparent. To investigate GIF in a specific scenario, we propose SHOT, the first large-scale dataset for GIF, consisting of 1,979 basketball video clips captured from 5 camera views and annotated with 6 types of individual attributes. SHOT is designed with 3 key characteristics: multi-individual information, multi-view adaptability, and multi-level intention, making it well-suited for studying emerging group intentions. Furthermore, we introduce GIFT (Group Intention ForecasTer), a framework that extracts fine-grained individual features and models evolving group dynamics to forecast intention emergence. Experimental results confirm the effectiveness of SHOT and GIFT, establishing a strong foundation for future research in group intention forecasting. The dataset is available at https://xinyi-hu.github.io/SHOT_DATASET.

Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset

TL;DR

This work tackles forecasting group-level intentions, a step beyond traditional individual-intention recognition, by defining the GIF task and introducing SHOT, a large-scale, multi-view dataset with rich per-player annotations. The authors propose GIFT, a spatio-temporal encoder-decoder framework that models evolving inter-player dynamics to forecast when a group intention occurs, quantified as the frame within a clip of length . Experiments show SHOT's utility and that GIFT outperforms traditional temporal action localization baselines in timing accuracy (MAE), while highlighting the challenge of early-stage forecasting (lower F1) due to limited initial cues. The dataset and baseline provide a foundation for future research in group intention forecasting with broad implications for sports analytics, safety, and intelligent systems, enabling timely interventions based on emergent collective goals.

Abstract

Intention recognition has traditionally focused on individual intentions, overlooking the complexities of collective intentions in group settings. To address this limitation, we introduce the concept of group intention, which represents shared goals emerging through the actions of multiple individuals, and Group Intention Forecasting (GIF), a novel task that forecasts when group intentions will occur by analyzing individual actions and interactions before the collective goal becomes apparent. To investigate GIF in a specific scenario, we propose SHOT, the first large-scale dataset for GIF, consisting of 1,979 basketball video clips captured from 5 camera views and annotated with 6 types of individual attributes. SHOT is designed with 3 key characteristics: multi-individual information, multi-view adaptability, and multi-level intention, making it well-suited for studying emerging group intentions. Furthermore, we introduce GIFT (Group Intention ForecasTer), a framework that extracts fine-grained individual features and models evolving group dynamics to forecast intention emergence. Experimental results confirm the effectiveness of SHOT and GIFT, establishing a strong foundation for future research in group intention forecasting. The dataset is available at https://xinyi-hu.github.io/SHOT_DATASET.

Paper Structure

This paper contains 43 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Dataset pipeline overview.Collection: videos are sourced from NBA highlights and full-game replays, then compiled into an unlabeled pool. Categorization: clips are classified by camera view and tactical type. Annotation: features are labeled manually or via tracking models. Structure: video annotations are stored in a JSON file with this structure. Review: annotations are reviewed and relabeled as needed.
  • Figure 2: Architecture of GIFT. GIFT extracts bounding box, pose, gaze, headpose, velocity, and role features from the $\mathbf{\tau}$ seen frames ($\mathbf{\tau \in {1, 2, \dots, T}}$). The STGCN Encoder models spatial and temporal patterns. The STGCN Decoder forecasts future features, from which the shooting role is identified to determine the frame number.