Table of Contents
Fetching ...

Active Generation Network of Human Skeleton for Action Recognition

Long Liu, Xin Wang, Fangming Li, Jiayu Chen

TL;DR

The paper addresses the scarcity and temporal-inconsistency challenges in skeleton-based HAR data generation by introducing Active Generation Network (AGN), a framework that combines a Motion Generation Network (MGN) and an Uncertainty Metric Network (UMN). MGN performs motion-style transfer on skeleton graphs using ST-GCN-based encoders/decoders with Body-Part AdaIN and attention mechanisms to maintain category features while transferring morphology, while UMN provides uncertainty-based sampling to guide generation. Training integrates reconstruction, cycle-consistency, and triplet losses, with specified weights to encourage realistic, category-consistent actions. Evaluations on NTU-RGB+D 60 demonstrate high-quality generated actions (FMD around 2–3; Acc often above 90%), and that augmenting training with generated data improves action recognition performance, outperforming several prior methods, especially under data-scarce conditions.

Abstract

Data generation is a data augmentation technique for enhancing the generalization ability for skeleton-based human action recognition. Most existing data generation methods face challenges to ensure the temporal consistency of the dynamic information for action. In addition, the data generated by these methods lack diversity when only a few training samples are available. To solve those problems, We propose a novel active generative network (AGN), which can adaptively learn various action categories by motion style transfer to generate new actions when the data for a particular action is only a single sample or few samples. The AGN consists of an action generation network and an uncertainty metric network. The former, with ST-GCN as the Backbone, can implicitly learn the morphological features of the target action while preserving the category features of the source action. The latter guides generating actions. Specifically, an action recognition model generates prediction vectors for each action, which is then scored using an uncertainty metric. Finally, UMN provides the uncertainty sampling basis for the generated actions.

Active Generation Network of Human Skeleton for Action Recognition

TL;DR

The paper addresses the scarcity and temporal-inconsistency challenges in skeleton-based HAR data generation by introducing Active Generation Network (AGN), a framework that combines a Motion Generation Network (MGN) and an Uncertainty Metric Network (UMN). MGN performs motion-style transfer on skeleton graphs using ST-GCN-based encoders/decoders with Body-Part AdaIN and attention mechanisms to maintain category features while transferring morphology, while UMN provides uncertainty-based sampling to guide generation. Training integrates reconstruction, cycle-consistency, and triplet losses, with specified weights to encourage realistic, category-consistent actions. Evaluations on NTU-RGB+D 60 demonstrate high-quality generated actions (FMD around 2–3; Acc often above 90%), and that augmenting training with generated data improves action recognition performance, outperforming several prior methods, especially under data-scarce conditions.

Abstract

Data generation is a data augmentation technique for enhancing the generalization ability for skeleton-based human action recognition. Most existing data generation methods face challenges to ensure the temporal consistency of the dynamic information for action. In addition, the data generated by these methods lack diversity when only a few training samples are available. To solve those problems, We propose a novel active generative network (AGN), which can adaptively learn various action categories by motion style transfer to generate new actions when the data for a particular action is only a single sample or few samples. The AGN consists of an action generation network and an uncertainty metric network. The former, with ST-GCN as the Backbone, can implicitly learn the morphological features of the target action while preserving the category features of the source action. The latter guides generating actions. Specifically, an action recognition model generates prediction vectors for each action, which is then scored using an uncertainty metric. Finally, UMN provides the uncertainty sampling basis for the generated actions.
Paper Structure (12 sections, 8 equations, 5 figures, 4 tables)

This paper contains 12 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The overall network architecture of our AGN framework.
  • Figure 2: Results generated by MGN on seen actions. (a) “Reach into Pocket”. (b) “Hopping”. (c) “Put Palms Together”. (d) “Bow”.
  • Figure 3: Results generated by MGN on unseen actions. (a) and (b) are “Drink water” (Unseen). (c) and (d) are “Kicking Something” (Seen). (e) and (f) are “Jump Up” (Unseen).
  • Figure 4: Action data is projected into 2D space using t-SNE, where black is the source action, red is a sample of black, cyan is the target action, and green is some new action generated using a sample of red and some cyan. The green sample and the black sample are very close to each other in space, which indicates that the generated actions conform to some extent to the distribution of the source actions.
  • Figure 5: Comparison results. We used four methods to generate hand action (“Drinking Water”), leg action (“Kicking Something”), and whole-body action (“Jump up”).