Table of Contents
Fetching ...

Expressive Keypoints for Skeleton-based Action Recognition via Skeleton Transformation

Yijie Yang, Jinlu Zhang, Jiaxu Zhang, Zhigang Tu

TL;DR

This work proposes Expressive Keypoints that incorporates hand and foot details to form a fine-grained skeletal representation, improving the discriminative ability for existing models in discerning intricate actions.

Abstract

In the realm of skeleton-based action recognition, the traditional methods which rely on coarse body keypoints fall short of capturing subtle human actions. In this work, we propose Expressive Keypoints that incorporates hand and foot details to form a fine-grained skeletal representation, improving the discriminative ability for existing models in discerning intricate actions. To efficiently model Expressive Keypoints, the Skeleton Transformation strategy is presented to gradually downsample the keypoints and prioritize prominent joints by allocating the importance weights. Additionally, a plug-and-play Instance Pooling module is exploited to extend our approach to multi-person scenarios without surging computation costs. Extensive experimental results over seven datasets present the superiority of our method compared to the state-of-the-art for skeleton-based human action recognition. Code is available at https://github.com/YijieYang23/SkeleT-GCN.

Expressive Keypoints for Skeleton-based Action Recognition via Skeleton Transformation

TL;DR

This work proposes Expressive Keypoints that incorporates hand and foot details to form a fine-grained skeletal representation, improving the discriminative ability for existing models in discerning intricate actions.

Abstract

In the realm of skeleton-based action recognition, the traditional methods which rely on coarse body keypoints fall short of capturing subtle human actions. In this work, we propose Expressive Keypoints that incorporates hand and foot details to form a fine-grained skeletal representation, improving the discriminative ability for existing models in discerning intricate actions. To efficiently model Expressive Keypoints, the Skeleton Transformation strategy is presented to gradually downsample the keypoints and prioritize prominent joints by allocating the importance weights. Additionally, a plug-and-play Instance Pooling module is exploited to extend our approach to multi-person scenarios without surging computation costs. Extensive experimental results over seven datasets present the superiority of our method compared to the state-of-the-art for skeleton-based human action recognition. Code is available at https://github.com/YijieYang23/SkeleT-GCN.

Paper Structure

This paper contains 28 sections, 9 equations, 13 figures, 16 tables.

Figures (13)

  • Figure 1: (a). Various representations of the same actions. (b). Accuracy and efficiency comparison of our method and the representative methods on NTU-60 ntu60 (Top) and NTU-120 ntu120 (Bottom).
  • Figure 2: Overview of proposed pipeline. (a). We use a top-down estimator to extract COCO-WholeBody Keypoints from videos, and conduct keypoint selection based on statistical metrics to remove the redundant facial keypoints, forming our Expressive Keypoints representation. (b). We propose the Skeleton Transformation strategy that can be integrated into most GCN methods to efficiently process Expressive Keypoints. It guides the network to alter the skeletal features in groups by re-weighting and gradually downsampling the keypoints. (c). We implement a Instance Pooling module to fuse the multiple instances in the early stage. We use it as an lightweight extension for evalutaion our methods in general wild scenarios, which contains multi-person group activities.
  • Figure 3: The architecture of Grouped Mapping Framework $\hat{\mathcal{F}}$. Most GCN-based methods' the graph convolution layer $\mathcal{G}$ and the temporal convolution layer $\mathcal{T}$ can be adopted.
  • Figure 4: Pre-defined keypoint partition.
  • Figure 5: IP module.
  • ...and 8 more figures