Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation
Cuiwei Liu, Youzhi Jiang, Chong Du, Zhaokui Li
TL;DR
The paper tackles action recognition from noisy, low-quality skeleton data by introducing a general teacher–student knowledge-distillation framework that transfers fine-grained part-level knowledge from high-quality skeletons. It leverages a part-based skeleton matching strategy, an action-specific high-efficiency part matrix, and a part-level multi-sample contrastive loss to align part representations across heterogeneous pose graphs and even when only solitary low-quality samples are available. The approach yields consistent improvements over strong baselines and state-of-the-art methods across NTU-RGB+D, Penn Action, and SYSU 3D HOI, demonstrating robustness to occlusion and missing joints in real-world settings. These results highlight the practical impact for robust, real-time action recognition in scenarios with limited or degraded pose information.
Abstract
Skeleton-based action recognition is vital for comprehending human-centric videos and has applications in diverse domains. One of the challenges of skeleton-based action recognition is dealing with low-quality data, such as skeletons that have missing or inaccurate joints. This paper addresses the issue of enhancing action recognition using low-quality skeletons through a general knowledge distillation framework. The proposed framework employs a teacher-student model setup, where a teacher model trained on high-quality skeletons guides the learning of a student model that handles low-quality skeletons. To bridge the gap between heterogeneous high-quality and lowquality skeletons, we present a novel part-based skeleton matching strategy, which exploits shared body parts to facilitate local action pattern learning. An action-specific part matrix is developed to emphasize critical parts for different actions, enabling the student model to distill discriminative part-level knowledge. A novel part-level multi-sample contrastive loss achieves knowledge transfer from multiple high-quality skeletons to low-quality ones, which enables the proposed knowledge distillation framework to include training low-quality skeletons that lack corresponding high-quality matches. Comprehensive experiments conducted on the NTU-RGB+D, Penn Action, and SYSU 3D HOI datasets demonstrate the effectiveness of the proposed knowledge distillation framework.
