Table of Contents
Fetching ...

Recovering Complete Actions for Cross-dataset Skeleton Action Recognition

Hanchao Liu, Yujiang Li, Tai-Jiang Mu, Shi-Min Hu

TL;DR

To solve the skeleton action generalization problem, this paper presents a recover-and-resample augmentation framework based on a novel complete action prior that outperforms other domain generalization approaches by a considerable margin.

Abstract

Despite huge progress in skeleton-based action recognition, its generalizability to different domains remains a challenging issue. In this paper, to solve the skeleton action generalization problem, we present a recover-and-resample augmentation framework based on a novel complete action prior. We observe that human daily actions are confronted with temporal mismatch across different datasets, as they are usually partial observations of their complete action sequences. By recovering complete actions and resampling from these full sequences, we can generate strong augmentations for unseen domains. At the same time, we discover the nature of general action completeness within large datasets, indicated by the per-frame diversity over time. This allows us to exploit two assets of transferable knowledge that can be shared across action samples and be helpful for action completion: boundary poses for determining the action start, and linear temporal transforms for capturing global action patterns. Therefore, we formulate the recovering stage as a two-step stochastic action completion with boundary pose-conditioned extrapolation followed by smooth linear transforms. Both the boundary poses and linear transforms can be efficiently learned from the whole dataset via clustering. We validate our approach on a cross-dataset setting with three skeleton action datasets, outperforming other domain generalization approaches by a considerable margin.

Recovering Complete Actions for Cross-dataset Skeleton Action Recognition

TL;DR

To solve the skeleton action generalization problem, this paper presents a recover-and-resample augmentation framework based on a novel complete action prior that outperforms other domain generalization approaches by a considerable margin.

Abstract

Despite huge progress in skeleton-based action recognition, its generalizability to different domains remains a challenging issue. In this paper, to solve the skeleton action generalization problem, we present a recover-and-resample augmentation framework based on a novel complete action prior. We observe that human daily actions are confronted with temporal mismatch across different datasets, as they are usually partial observations of their complete action sequences. By recovering complete actions and resampling from these full sequences, we can generate strong augmentations for unseen domains. At the same time, we discover the nature of general action completeness within large datasets, indicated by the per-frame diversity over time. This allows us to exploit two assets of transferable knowledge that can be shared across action samples and be helpful for action completion: boundary poses for determining the action start, and linear temporal transforms for capturing global action patterns. Therefore, we formulate the recovering stage as a two-step stochastic action completion with boundary pose-conditioned extrapolation followed by smooth linear transforms. Both the boundary poses and linear transforms can be efficiently learned from the whole dataset via clustering. We validate our approach on a cross-dataset setting with three skeleton action datasets, outperforming other domain generalization approaches by a considerable margin.

Paper Structure

This paper contains 18 sections, 6 equations, 9 figures, 19 tables, 1 algorithm.

Figures (9)

  • Figure 1: (a) Cross-dataset skeleton action recognition. Taking action phone calling as an example, temporal mismatch across datasets poses a challenging issue. (b) Complete action prior. Human actions within large datasets exhibit statistical patterns from less feature diversity to more diversity, implying the nature of action completeness (shown for NTU, PKU and ETRI dataset). (c) Recover and Resample. After learning a stochastic action completion function from the training data, we recover complete actions and resample from them to further augment the training set.
  • Figure 2: Overview of Recovering and Resampling. Given training set $\mathcal{S}$, we learn boundary poses $\{p_i\}$ and context-aware linear transforms $\{W_i\}$ via clustering. For a sample $x$ from $\mathcal{S}$, we first do extrapolation ($\mathcal{F_N}$) conditioning on the boundary pose $p'$ with infilling length $t_p$, and then perform linear transform ($\mathcal{F_L}$) by sampling from $\{W_i\}$. The new data points $x'$ are resampled from recovered complete actions as strong augmentations for unseen datasets. Skeletons in dark blue rectangles are new frames generated by $\mathcal{F_N}$ and $\mathcal{F_L}$. Both $x$ and $x'$ are used for training the classifier.
  • Figure 3: Visualization for selected linear transform matrices $\{W_i\}$ via clustering using training sets $N$ and $E_A$.
  • Figure 4: Visualization for the boundary pose clustering result $\{p_i\}$ when $N_{\text{bkg}}=5$. (a) Pose clusters for training set $N$ and (b) Pose clusters for training set $E_A$.
  • Figure 5: Examples of some recovered complete actions. The skeletons in blue are raw inputs and the skeletons in sky blue are new frames generated by our method.
  • ...and 4 more figures