VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition
Zhuming Wang, Yihao Zheng, Jiarui Li, Yaofei Wu, Yan Huang, Zun Li, Lifang Wu, Liang Wang
TL;DR
VicKAM tackles weakly supervised group activity recognition by introducing visual conceptual knowledge to ground action semantics in visual representations. It constructs action prototypes from labeled data and builds activity-specific action statistics, then generates action maps via image correlation to capture where actions occur, subsequently enriching these maps with semantic embeddings. The method uses a two-stage training regime that first leverages individual annotations and then operates without them, combining global and group-level cues for final GAR predictions. Experiments on Volleyball and NBA datasets demonstrate strong performance, particularly under limited data, while revealing some domain-transfer challenges and the need for actor-level supervision for knowledge grounding.
Abstract
Existing weakly supervised group activity recognition methods rely on object detectors or attention mechanisms to capture key areas automatically. However, they overlook the semantic information associated with captured areas, which may adversely affect the recognition performance. In this paper, we propose a novel framework named Visual Conceptual Knowledge Guided Action Map (VicKAM) which effectively captures the locations of individual actions and integrates them with action semantics for weakly supervised group activity recognition.It generates individual action prototypes from training set as visual conceptual knowledge to bridge action semantics and visual representations. Guided by this knowledge, VicKAM produces action maps that indicate the likelihood of each action occurring at various locations, based on image correlation theorem. It further augments individual action maps using group activity related statistical information, representing individual action distribution under different group activities, to establish connections between action maps and specific group activities. The augmented action map is incorporated with action semantic representations for group activity recognition.Extensive experiments on two public benchmarks, the Volleyball and the NBA datasets, demonstrate the effectiveness of our proposed method, even in cases of limited training data. The code will be released later.
