Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition
Hongda Liu, Yunlong Wang, Min Ren, Junxing Hu, Zhengquan Luo, Guangqi Hou, Zhenan Sun
TL;DR
Long-tailed data severely biases skeleton-based action representations, limiting recognition performance. The authors introduce Balanced Representation Learning (BRL), combining Spatial-Temporal Action Exploration (STAE) with Rebalanced Partial Mixup and Temporal Reverse Perception, plus a Detached Action-Aware Learning Schedule (DAA) and skip-modal ensemble to learn unbiased representations. STAE enriches the sample space with spatial and temporal diversity, while DAA modulates learning to emphasize tail classes using $eta_y$ and $eta$-weighted losses, e.g., $eta_y = ext{like } eta_y = ext{max}( ext{...})$ and $ abla ext{guided by } rac{1-eta_y}{1-eta_y^{n_y}}$; BRL also incorporates a skip-modal fusion to exploit alternative joint relationships. Evaluations on NTU RGB+D 60/120, Northwestern-UCLA, and Kinetics Skeleton 400 show consistent, substantial gains over state-of-the-art long-tailed methods, with ablations confirming the contribution of each component. BRL provides a practical path to robust long-tailed skeleton action recognition across backbones and datasets, with public code enabling broader adoption.
Abstract
Skeleton-based action recognition has recently made significant progress. However, data imbalance is still a great challenge in real-world scenarios. The performance of current action recognition algorithms declines sharply when training data suffers from heavy class imbalance. The imbalanced data actually degrades the representations learned by these methods and becomes the bottleneck for action recognition. How to learn unbiased representations from imbalanced action data is the key to long-tailed action recognition. In this paper, we propose a novel balanced representation learning method to address the long-tailed problem in action recognition. Firstly, a spatial-temporal action exploration strategy is presented to expand the sample space effectively, generating more valuable samples in a rebalanced manner. Secondly, we design a detached action-aware learning schedule to further mitigate the bias in the representation space. The schedule detaches the representation learning of tail classes from training and proposes an action-aware loss to impose more effective constraints. Additionally, a skip-modal representation is proposed to provide complementary structural information. The proposed method is validated on four skeleton datasets, NTU RGB+D 60, NTU RGB+D 120, NW-UCLA, and Kinetics. It not only achieves consistently large improvement compared to the state-of-the-art (SOTA) methods, but also demonstrates a superior generalization capacity through extensive experiments. Our code is available at https://github.com/firework8/BRL.
