Table of Contents
Fetching ...

Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition

Zeyu Liang, Hailun Xia, Naichuan Zheng, Huan Xu

TL;DR

A novel Topological Symmetry Enhanced Graph Convolution (TSE-GC) is proposed to enable distinct topology learning across different channel partitions while incorporating topological symmetry awareness and a Multi-Branch Deformable Temporal Convolution (MBDTC) is constructed for skeleton-based action recognition.

Abstract

Skeleton-based action recognition has achieved remarkable performance with the development of graph convolutional networks (GCNs). However, most of these methods tend to construct complex topology learning mechanisms while neglecting the inherent symmetry of the human body. Additionally, the use of temporal convolutions with certain fixed receptive fields limits their capacity to effectively capture dependencies in time sequences. To address the issues, we (1) propose a novel Topological Symmetry Enhanced Graph Convolution (TSE-GC) to enable distinct topology learning across different channel partitions while incorporating topological symmetry awareness and (2) construct a Multi-Branch Deformable Temporal Convolution (MBDTC) for skeleton-based action recognition. The proposed TSE-GC emphasizes the inherent symmetry of the human body while enabling efficient learning of dynamic topologies. Meanwhile, the design of MBDTC introduces the concept of deformable modeling, leading to more flexible receptive fields and stronger modeling capacity of temporal dependencies. Combining TSE-GC with MBDTC, our final model, TSE-GCN, achieves competitive performance with fewer parameters compared with state-of-the-art methods on three large datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. On the cross-subject and cross-set evaluations of NTU RGB+D 120, the accuracies of our model reach 90.0\% and 91.1\%, with 1.1M parameters and 1.38 GFLOPS for one stream.

Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition

TL;DR

A novel Topological Symmetry Enhanced Graph Convolution (TSE-GC) is proposed to enable distinct topology learning across different channel partitions while incorporating topological symmetry awareness and a Multi-Branch Deformable Temporal Convolution (MBDTC) is constructed for skeleton-based action recognition.

Abstract

Skeleton-based action recognition has achieved remarkable performance with the development of graph convolutional networks (GCNs). However, most of these methods tend to construct complex topology learning mechanisms while neglecting the inherent symmetry of the human body. Additionally, the use of temporal convolutions with certain fixed receptive fields limits their capacity to effectively capture dependencies in time sequences. To address the issues, we (1) propose a novel Topological Symmetry Enhanced Graph Convolution (TSE-GC) to enable distinct topology learning across different channel partitions while incorporating topological symmetry awareness and (2) construct a Multi-Branch Deformable Temporal Convolution (MBDTC) for skeleton-based action recognition. The proposed TSE-GC emphasizes the inherent symmetry of the human body while enabling efficient learning of dynamic topologies. Meanwhile, the design of MBDTC introduces the concept of deformable modeling, leading to more flexible receptive fields and stronger modeling capacity of temporal dependencies. Combining TSE-GC with MBDTC, our final model, TSE-GCN, achieves competitive performance with fewer parameters compared with state-of-the-art methods on three large datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. On the cross-subject and cross-set evaluations of NTU RGB+D 120, the accuracies of our model reach 90.0\% and 91.1\%, with 1.1M parameters and 1.38 GFLOPS for one stream.

Paper Structure

This paper contains 17 sections, 16 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Topology reactivation with symmetry awareness. The correlations between $v_6$ and $v_2$, $v_3$, $v_8$, $v_9$ in the shared topology are activated due to the scale mask derived from the correlation between $v_6$ and $v_3$. Darker colors and thicker lines stand for larger weights. Best viewed in color.
  • Figure 2: Architecture overview of the proposed TSE-GCN. PE denotes the learnable absolute positional embeddingchi2022infogcn. L denotes the number of stacked layers. In TSE-GC module, $f$ represents the generation function of scale mask where the indexes of top-k neighbors are mapped to relevant scales. $\otimes$, $\odot$, $\oplus$ denote matrix multiplication, element-wise multiplication, element-wise sum, respectively. Best viewed in color.
  • Figure 3: Sampling mechanism in our DTC Module. The receptive fields for the frame enclosed by the red dashed line are enclosed by the blue dashed line with the offsets. We assume the offsets to be integers for clarity. Best viewed in color.
  • Figure 4: Accuracy difference(%) between TSE-GC and two representative GCs on symmetry related classes generated by GPT4 achiam2023gpt. Green bars indicate improvements.