Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human Skeleton
Tianyi Shen, Huijuan Xu, Nilesh Ahuja, Omesh Tickoo, Philip Shin, Vijaykrishnan Narayanan
TL;DR
This work tackles occlusion in skeleton-based action recognition by addressing the limitations of local-context GCNs. It introduces Parts-Mamba, a hybrid GCN-Mamba framework that combines Part-Wise/Body Scanning, topological graph modeling, and a gated fusion to retain distant joint context, complemented by a Mamba Temporal Encoder for temporal dynamics. The approach achieves state-of-the-art results across multiple occlusion scenarios on NTU-60 and NTU-120, including parts-specific, random, and periodic frame occlusions, with notable gains over prior methods and lower computational cost than ViT or MLPMixer baselines. Overall, Parts-Mamba enhances robustness to occlusion in real-world skeleton action recognition, offering practical improvements in accuracy and efficiency for downstream applications.
Abstract
Skeleton action recognition involves recognizing human action from human skeletons. The use of graph convolutional networks (GCNs) has driven major advances in this recognition task. In real-world scenarios, the captured skeletons are not always perfect or complete because of occlusions of parts of the human body or poor communication quality, leading to missing parts in skeletons or videos with missing frames. In the presence of such non-idealities, existing GCN models perform poorly due to missing local context. To address this limitation, we propose Parts-Mamba, a hybrid GCN-Mamba model designed to enhance the ability to capture and maintain contextual information from distant joints. The proposed Parts-Mamba model effectively captures part-specific information through its parts-specific scanning feature and preserves non-neighboring joint context via a parts-body fusion module. Our proposed model is evaluated on the NTU RGB+D 60 and NTU RGB+D 120 datasets under different occlusion settings, achieving up to 12.9% improvement in accuracy.
