Towards Micro-Action Recognition with Limited Annotations: An Asynchronous Pseudo Labeling and Training Approach
Yan Zhang, Lechao Cheng, Yaxiong Wang, Zhun Zhong, Meng Wang
TL;DR
This work addresses the annotation bottleneck in Micro-Action Recognition by proposing Semi-Supervised MAR (SSMAR) and an asynchronous learning framework, APLT, that decouples pseudo-label generation from model training. Phase I generates high-quality pseudo-labels via semi-supervised clustering with labeled augmentation and self-adaptive thresholds, feeding a memory-based prototype classifier. Phase II trains the model with a combined loss that leverages both the parametric classifier and the fixed prototypes, with alternating offline and online updates to reduce overfitting. Across three MAR benchmarks, APLT consistently outperforms state-of-the-art SSL methods, achieving substantial gains such as a 14.5 percentage point improvement over FixMatch on MA-12 with 50% labeled data, demonstrating the practical impact of asynchronous pseudo-labeling and non-parametric supervision in low-label regimes.
Abstract
Micro-Action Recognition (MAR) aims to classify subtle human actions in video. However, annotating MAR datasets is particularly challenging due to the subtlety of actions. To this end, we introduce the setting of Semi-Supervised MAR (SSMAR), where only a part of samples are labeled. We first evaluate traditional Semi-Supervised Learning (SSL) methods to SSMAR and find that these methods tend to overfit on inaccurate pseudo-labels, leading to error accumulation and degraded performance. This issue primarily arises from the common practice of directly using the predictions of classifier as pseudo-labels to train the model. To solve this issue, we propose a novel framework, called Asynchronous Pseudo Labeling and Training (APLT), which explicitly separates the pseudo-labeling process from model training. Specifically, we introduce a semi-supervised clustering method during the offline pseudo-labeling phase to generate more accurate pseudo-labels. Moreover, a self-adaptive thresholding strategy is proposed to dynamically filter noisy labels of different classes. We then build a memory-based prototype classifier based on the filtered pseudo-labels, which is fixed and used to guide the subsequent model training phase. By alternating the two pseudo-labeling and model training phases in an asynchronous manner, the model can not only be learned with more accurate pseudo-labels but also avoid the overfitting issue. Experiments on three MAR datasets show that our APLT largely outperforms state-of-the-art SSL methods. For instance, APLT improves accuracy by 14.5\% over FixMatch on the MA-12 dataset when using only 50\% labeled data. Code will be publicly available.
