Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition

Hongda Liu; Yunlong Wang; Min Ren; Junxing Hu; Zhengquan Luo; Guangqi Hou; Zhenan Sun

Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition

Hongda Liu, Yunlong Wang, Min Ren, Junxing Hu, Zhengquan Luo, Guangqi Hou, Zhenan Sun

TL;DR

Long-tailed data severely biases skeleton-based action representations, limiting recognition performance. The authors introduce Balanced Representation Learning (BRL), combining Spatial-Temporal Action Exploration (STAE) with Rebalanced Partial Mixup and Temporal Reverse Perception, plus a Detached Action-Aware Learning Schedule (DAA) and skip-modal ensemble to learn unbiased representations. STAE enriches the sample space with spatial and temporal diversity, while DAA modulates learning to emphasize tail classes using $eta_y$ and $eta$-weighted losses, e.g., $eta_y = ext{like } eta_y = ext{max}( ext{...})$ and $ abla ext{guided by } rac{1-eta_y}{1-eta_y^{n_y}}$; BRL also incorporates a skip-modal fusion to exploit alternative joint relationships. Evaluations on NTU RGB+D 60/120, Northwestern-UCLA, and Kinetics Skeleton 400 show consistent, substantial gains over state-of-the-art long-tailed methods, with ablations confirming the contribution of each component. BRL provides a practical path to robust long-tailed skeleton action recognition across backbones and datasets, with public code enabling broader adoption.

Abstract

Skeleton-based action recognition has recently made significant progress. However, data imbalance is still a great challenge in real-world scenarios. The performance of current action recognition algorithms declines sharply when training data suffers from heavy class imbalance. The imbalanced data actually degrades the representations learned by these methods and becomes the bottleneck for action recognition. How to learn unbiased representations from imbalanced action data is the key to long-tailed action recognition. In this paper, we propose a novel balanced representation learning method to address the long-tailed problem in action recognition. Firstly, a spatial-temporal action exploration strategy is presented to expand the sample space effectively, generating more valuable samples in a rebalanced manner. Secondly, we design a detached action-aware learning schedule to further mitigate the bias in the representation space. The schedule detaches the representation learning of tail classes from training and proposes an action-aware loss to impose more effective constraints. Additionally, a skip-modal representation is proposed to provide complementary structural information. The proposed method is validated on four skeleton datasets, NTU RGB+D 60, NTU RGB+D 120, NW-UCLA, and Kinetics. It not only achieves consistently large improvement compared to the state-of-the-art (SOTA) methods, but also demonstrates a superior generalization capacity through extensive experiments. Our code is available at https://github.com/firework8/BRL.

Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition

TL;DR

and

-weighted losses, e.g.,

and

; BRL also incorporates a skip-modal fusion to exploit alternative joint relationships. Evaluations on NTU RGB+D 60/120, Northwestern-UCLA, and Kinetics Skeleton 400 show consistent, substantial gains over state-of-the-art long-tailed methods, with ablations confirming the contribution of each component. BRL provides a practical path to robust long-tailed skeleton action recognition across backbones and datasets, with public code enabling broader adoption.

Abstract

Paper Structure (33 sections, 9 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 33 sections, 9 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Skeleton-based Action Recognition
Long-tailed Learning
Long-tailed Skeleton-based Action Recognition
Methodology
Data Pre-processing
Spatial-Temporal Action Exploration
Rebalanced Partial Mixup
Temporal Reverse Perception
Detached Action-Aware Learning Schedule
Ensemble with Multi-Modal Representation
Experiments
Datasets
NTU RGB+D 60
...and 18 more sections

Figures (7)

Figure 1: Architecture Overview. (a) Long-tailed action data. The head class and tail class of a long-tailed action dataset (NTU RGB+D-LT in this sample) have drastically different numbers of samples. (b) An overview of the proposed architecture. (c) Spatial-temporal action exploration strategy to generate more valuable skeletal data. (d) ST-GCN++ is adopted as the backbone. (e) Detached action-aware learning is proposed to mitigate representation bias in long-tailed action recognition.
Figure 2: An illustration of the proposed method. (Best viewed in color.) First, the spatial-temporal action exploration strategy is presented to expand the corresponding sample space effectively. Besides, due to skewed data distribution, the original decision boundary is compromised. The detached action-aware learning schedule can delicately adjust decision boundaries to further mitigate representation bias. Through the unification process, the proposed method can learn the balanced representations from long-tailed action data.
Figure 3: Rebalanced part mixup combines parts of the body skeleton to generate more valuable skeleton data in a rebalanced manner. With the rebalanced label design, the strategy can alleviate the effects of imbalanced data, resulting in a more balanced distribution of action data.
Figure 4: The number of training samples per class in the constructed long-tailed action datasets.
Figure 5: The t-SNE visualization of the features learned by (a) Baseline, (b) Baseline with augmentation, (c) The proposed method with only the STAE strategy, and (d) The proposed balanced representation learning method. Different colors indicate different classes. Note that the learned representations of five head classes and five tail classes are visualized. The sample points of tail classes are overlaid by grey area to show the improvements brought by different components.
...and 2 more figures

Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition

TL;DR

Abstract

Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (7)