GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition
Lei Jiang, Weixin Yang, Xin Zhang, Hao Ni
TL;DR
The paper tackles skeleton-based action recognition by enhancing temporal modeling in graph-based architectures. It introduces the G-Dev layer based on path development on temporal graphs and integrates it into a GCN-DevLSTM network, enabling effective temporal feature extraction while reducing time-dimension. Experiments on Chalearn2013, NTU-60, and NTU-120 demonstrate state-of-the-art accuracy and robustness to irregular sampling and missing data, with a plug-and-play design for different GCN backbones. The work provides a generic temporal-graph module with practical code release, offering a versatile approach for SAR and related sequential graph data tasks.
Abstract
Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision. The recent state-of-the-art (SOTA) models for SAR are primarily based on graph convolutional neural networks (GCNs), which are powerful in extracting the spatial information of skeleton data. However, it is yet clear that such GCN-based models can effectively capture the temporal dynamics of human action sequences. To this end, we propose the G-Dev layer, which exploits the path development -- a principled and parsimonious representation for sequential data by leveraging the Lie group structure. By integrating the G-Dev layer, the hybrid G-DevLSTM module enhances the traditional LSTM to reduce the time dimension while retaining high-frequency information. It can be conveniently applied to any temporal graph data, complementing existing advanced GCN-based models. Our empirical studies on the NTU60, NTU120 and Chalearn2013 datasets demonstrate that our proposed GCN-DevLSTM network consistently improves the strong GCN baseline models and achieves SOTA results with superior robustness in SAR tasks. The code is available at https://github.com/DeepIntoStreams/GCN-DevLSTM.
