Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples
Zhigang Tu, Zhengbo Zhang, Jia Gong, Junsong Yuan, Bo Du
TL;DR
This work tackles skeleton-based action recognition under limited annotations by reframing semi-supervised 3D Action Recognition via Active Learning (S3ARAL) as a Markov Decision Process. It introduces an Informative Sample Selection Model (ISSM) trained with Double DQN, where state representations are projected to hyperbolic space to better capture hierarchical skeleton structure, and rewards reflect the action recognizer's performance gains. A meta-tuning strategy based on meta-learning accelerates deployment when expanding labeled data. Across three benchmarks (UWA3D, NW-UCLA, NTU RGB+D 60), the method achieves state-of-the-art accuracy over varying labeling budgets and demonstrates strong generalization, with ablations confirming the value of hyperbolic representations, MMD-based state gaps, and meta-tuning.
Abstract
Skeleton-based human action recognition aims to classify human skeletal sequences, which are spatiotemporal representations of actions, into predefined categories. To reduce the reliance on costly annotations of skeletal sequences while maintaining competitive recognition accuracy, the task of 3D Action Recognition with Limited Training Samples, also known as semi-supervised 3D Action Recognition, has been proposed. In addition, active learning, which aims to proactively select the most informative unlabeled samples for annotation, has been explored in semi-supervised 3D Action Recognition for training sample selection. Specifically, researchers adopt an encoder-decoder framework to embed skeleton sequences into a latent space, where clustering information, combined with a margin-based selection strategy using a multi-head mechanism, is utilized to identify the most informative sequences in the unlabeled set for annotation. However, the most representative skeleton sequences may not necessarily be the most informative for the action recognizer, as the model may have already acquired similar knowledge from previously seen skeleton samples. To solve it, we reformulate Semi-supervised 3D action recognition via active learning from a novel perspective by casting it as a Markov Decision Process (MDP). Built upon the MDP framework and its training paradigm, we train an informative sample selection model to intelligently guide the selection of skeleton sequences for annotation. To enhance the representational capacity of the factors in the state-action pairs within our method, we project them from Euclidean space to hyperbolic space. Furthermore, we introduce a meta tuning strategy to accelerate the deployment of our method in real-world scenarios. Extensive experiments on three 3D action recognition benchmarks demonstrate the effectiveness of our method.
