Table of Contents
Fetching ...

GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition

Lei Jiang, Weixin Yang, Xin Zhang, Hao Ni

TL;DR

The paper tackles skeleton-based action recognition by enhancing temporal modeling in graph-based architectures. It introduces the G-Dev layer based on path development on temporal graphs and integrates it into a GCN-DevLSTM network, enabling effective temporal feature extraction while reducing time-dimension. Experiments on Chalearn2013, NTU-60, and NTU-120 demonstrate state-of-the-art accuracy and robustness to irregular sampling and missing data, with a plug-and-play design for different GCN backbones. The work provides a generic temporal-graph module with practical code release, offering a versatile approach for SAR and related sequential graph data tasks.

Abstract

Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision. The recent state-of-the-art (SOTA) models for SAR are primarily based on graph convolutional neural networks (GCNs), which are powerful in extracting the spatial information of skeleton data. However, it is yet clear that such GCN-based models can effectively capture the temporal dynamics of human action sequences. To this end, we propose the G-Dev layer, which exploits the path development -- a principled and parsimonious representation for sequential data by leveraging the Lie group structure. By integrating the G-Dev layer, the hybrid G-DevLSTM module enhances the traditional LSTM to reduce the time dimension while retaining high-frequency information. It can be conveniently applied to any temporal graph data, complementing existing advanced GCN-based models. Our empirical studies on the NTU60, NTU120 and Chalearn2013 datasets demonstrate that our proposed GCN-DevLSTM network consistently improves the strong GCN baseline models and achieves SOTA results with superior robustness in SAR tasks. The code is available at https://github.com/DeepIntoStreams/GCN-DevLSTM.

GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition

TL;DR

The paper tackles skeleton-based action recognition by enhancing temporal modeling in graph-based architectures. It introduces the G-Dev layer based on path development on temporal graphs and integrates it into a GCN-DevLSTM network, enabling effective temporal feature extraction while reducing time-dimension. Experiments on Chalearn2013, NTU-60, and NTU-120 demonstrate state-of-the-art accuracy and robustness to irregular sampling and missing data, with a plug-and-play design for different GCN backbones. The work provides a generic temporal-graph module with practical code release, offering a versatile approach for SAR and related sequential graph data tasks.

Abstract

Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision. The recent state-of-the-art (SOTA) models for SAR are primarily based on graph convolutional neural networks (GCNs), which are powerful in extracting the spatial information of skeleton data. However, it is yet clear that such GCN-based models can effectively capture the temporal dynamics of human action sequences. To this end, we propose the G-Dev layer, which exploits the path development -- a principled and parsimonious representation for sequential data by leveraging the Lie group structure. By integrating the G-Dev layer, the hybrid G-DevLSTM module enhances the traditional LSTM to reduce the time dimension while retaining high-frequency information. It can be conveniently applied to any temporal graph data, complementing existing advanced GCN-based models. Our empirical studies on the NTU60, NTU120 and Chalearn2013 datasets demonstrate that our proposed GCN-DevLSTM network consistently improves the strong GCN baseline models and achieves SOTA results with superior robustness in SAR tasks. The code is available at https://github.com/DeepIntoStreams/GCN-DevLSTM.
Paper Structure (26 sections, 11 equations, 5 figures, 9 tables)

This paper contains 26 sections, 11 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: The work flow of G-Dev layer.
  • Figure 2: (a) The pipeline of our proposed approach consists of $N$ blocks, with each block containing a GCN module and a DevLSTM module. (b) The detail of the DevLSTM module.
  • Figure 3: Robustness analysis on NTU60 X-sub benchmark.
  • Figure 4: Dual Graph. Left side is the original skeleton representation in NTU dataset. The right side is its dual graph representation. Joint $V_{1-2}$ in the dual graph is the bone $B_{12}$ connecting joint $V_{1}$ and $V_{2}$ in original graph.
  • Figure 5: Three GCN Modules used in this paper. The subfigures from left to right represent the CTR-GC module, Adaptive graph convolution module, and the fixed graph, respectively.

Theorems & Definitions (4)

  • Definition 3.1: Path Development
  • Example 1: Linear path
  • Definition 3.2: Path development layer
  • Definition 3.3: G-Dev Sequence layer