Table of Contents
Fetching ...

High-Performance Inference Graph Convolutional Networks for Skeleton-Based Action Recognition

Junyi Wang, Ziao Li, Bangli Liu, Haibin Cai, Mohamad Saada, Qinggang Meng

TL;DR

The paper tackles real-time skeleton-based action recognition with graph convolutions, where state-of-the-art models rely on complex multi-branch topologies that hinder inference speed. It introduces re-parameterization (HPI-GCN-RP) and over-parameterization (HPI-GCN-OP), plus Rep-TCN to preserve temporal modeling while enabling fast single-branch inference; adjacency matrix fusion further optimizes computation. On NTU-RGB+D 60/120 benchmarks, RP delivers up to $1.5\times$ faster inference with higher accuracy, while OP achieves SOTA-competitive performance with around $5\times$ faster inference, including $K=9$ variants matching or exceeding multi-stream baselines. The results demonstrate strong real-time performance gains with preserved or improved accuracy, and the methods generalize to other backbones, offering practical deployment potential; code is available at github.com/lizaowo/HPI-GCN.

Abstract

Recently, the significant achievements have been made in skeleton-based human action recognition with the emergence of graph convolutional networks (GCNs). However, the state-of-the-art (SOTA) models used for this task focus on constructing more complex higher-order connections between joint nodes to describe skeleton information, which leads to complex inference processes and high computational costs. To address the slow inference speed caused by overly complex model structures, we introduce re-parameterization and over-parameterization techniques to GCNs and propose two novel high-performance inference GCNs, namely HPI-GCN-RP and HPI-GCN-OP. After the completion of model training, model parameters are fixed. HPI-GCN-RP adopts re-parameterization technique to transform high-performance training model into fast inference model through linear transformations, which achieves a higher inference speed with competitive model performance. HPI-GCN-OP further utilizes over-parameterization technique to achieve higher performance improvement by introducing additional inference parameters, albeit with slightly decreased inference speed. The experimental results on the two skeleton-based action recognition datasets demonstrate the effectiveness of our approach. Our HPI-GCN-OP achieves performance comparable to the current SOTA models, with inference speeds five times faster. Specifically, our HPI-GCN-OP achieves an accuracy of 93\% on the cross-subject split of the NTU-RGB+D 60 dataset, and 90.1\% on the cross-subject benchmark of the NTU-RGB+D 120 dataset. Code is available at github.com/lizaowo/HPI-GCN.

High-Performance Inference Graph Convolutional Networks for Skeleton-Based Action Recognition

TL;DR

The paper tackles real-time skeleton-based action recognition with graph convolutions, where state-of-the-art models rely on complex multi-branch topologies that hinder inference speed. It introduces re-parameterization (HPI-GCN-RP) and over-parameterization (HPI-GCN-OP), plus Rep-TCN to preserve temporal modeling while enabling fast single-branch inference; adjacency matrix fusion further optimizes computation. On NTU-RGB+D 60/120 benchmarks, RP delivers up to faster inference with higher accuracy, while OP achieves SOTA-competitive performance with around faster inference, including variants matching or exceeding multi-stream baselines. The results demonstrate strong real-time performance gains with preserved or improved accuracy, and the methods generalize to other backbones, offering practical deployment potential; code is available at github.com/lizaowo/HPI-GCN.

Abstract

Recently, the significant achievements have been made in skeleton-based human action recognition with the emergence of graph convolutional networks (GCNs). However, the state-of-the-art (SOTA) models used for this task focus on constructing more complex higher-order connections between joint nodes to describe skeleton information, which leads to complex inference processes and high computational costs. To address the slow inference speed caused by overly complex model structures, we introduce re-parameterization and over-parameterization techniques to GCNs and propose two novel high-performance inference GCNs, namely HPI-GCN-RP and HPI-GCN-OP. After the completion of model training, model parameters are fixed. HPI-GCN-RP adopts re-parameterization technique to transform high-performance training model into fast inference model through linear transformations, which achieves a higher inference speed with competitive model performance. HPI-GCN-OP further utilizes over-parameterization technique to achieve higher performance improvement by introducing additional inference parameters, albeit with slightly decreased inference speed. The experimental results on the two skeleton-based action recognition datasets demonstrate the effectiveness of our approach. Our HPI-GCN-OP achieves performance comparable to the current SOTA models, with inference speeds five times faster. Specifically, our HPI-GCN-OP achieves an accuracy of 93\% on the cross-subject split of the NTU-RGB+D 60 dataset, and 90.1\% on the cross-subject benchmark of the NTU-RGB+D 120 dataset. Code is available at github.com/lizaowo/HPI-GCN.
Paper Structure (14 sections, 12 equations, 5 figures, 5 tables)

This paper contains 14 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of inference speeds for skeleton-based action analysis. The computational speed on an NVIDIA RTX 3090 GPU with a batch size of 64 and full precision (fp32), which is measured in milliseconds / iteration. The x-axis is logarithmic with a base of 10. The points of different positions in the same color represent models with different number of streams. The dataset is NTU-RGB+D 120 X-sub. K9 stands for time convolution with a kernel of 9$\times$1. Triangles represent our proposed HPI-GCN, circles represent mainstream methods. It can be seen that our performance curve is better than SOTA models. Please refer to Tab. \ref{['Tab1']} for more details.
  • Figure 2: The six re-parameterization methods. The left side is the training mode, and the right side is the inference mode. K represents the convolution kernel, and A represents the learnable adjacency matrix.
  • Figure 3: An overview of the HPI-GCN model. Specifically, it demonstrates the training and inference modes of the basic block of HPI-GCN-RP. We adopt a relatively complex training module, which can be transformed into a simple inference module through re-parameterization. Due to the linear transformation based on mathematical formulas, our inference module is equal to the training module, and it has fewer parameters, lower computational cost, and higher inference speed.
  • Figure 4: The Re-Parameterized Temporal Convolution Network (Rep-TCN). Fig. (1) is the training structure. Figs. (2) and (3) are the intermediate transformation process. Fig. (4) is the inference structure. B represents blending.
  • Figure 5: The two High-Performance Inference Graph Convolution (HPI-GC), where PA is a learnable adjacency matrix between the joints. Inference HPI-GC-RP is obtained from Training HPI-GC-RP with re-parameterization technique. HPI-GC-OP is derived from Inference HPI-GC-RP through the over-parameterization technique.