Self-supervised Multi-actor Social Activity Understanding in Streaming Videos
Shubham Trehan, Sathyanarayanan N. Aakur
TL;DR
This work tackles social activity recognition in streaming videos under minimal supervision by introducing self-supervised multi-actor predictive learning. It builds a visual–semantic action graph that represents actors and their social interactions, and employs spatial and temporal graph smoothing to propagate context across actors and frames. The training optimizes a combined objective $L_{total} = \lambda_1 L_{global} + \lambda_2 L_{actor}$ to predict future scene dynamics and actor-level changes without dense labels. On CAD and SocialCAD, the method delivers competitive group-activity and social-understanding performance and demonstrates notable generalization to arbitrary action localization on UCF Sports, JHMDB, and THUMOS'13, indicating strong potential for scalable, privacy-conscious SAR in streaming settings.
Abstract
This work addresses the problem of Social Activity Recognition (SAR), a critical component in real-world tasks like surveillance and assistive robotics. Unlike traditional event understanding approaches, SAR necessitates modeling individual actors' appearance and motions and contextualizing them within their social interactions. Traditional action localization methods fall short due to their single-actor, single-action assumption. Previous SAR research has relied heavily on densely annotated data, but privacy concerns limit their applicability in real-world settings. In this work, we propose a self-supervised approach based on multi-actor predictive learning for SAR in streaming videos. Using a visual-semantic graph structure, we model social interactions, enabling relational reasoning for robust performance with minimal labeled data. The proposed framework achieves competitive performance on standard group activity recognition benchmarks. Evaluation on three publicly available action localization benchmarks demonstrates its generalizability to arbitrary action localization.
