Table of Contents
Fetching ...

An Improved Graph Pooling Network for Skeleton-Based Action Recognition

Cong Wu, Xiao-Jun Wu, Tianyang Xu, Josef Kittler

TL;DR

This work tackles the challenge of pooling in skeleton-based action recognition by introducing IGPN, a trainable, region-aware structure pooling framework. It combines a region-aware pooling mechanism with correlation-guided weighting, a Cross Fusion Block for multi-granularity feature fusion, and an Information Supplement Module to enrich inputs, all implemented as a plug-and-play enhancement to existing GCN backbones. The approach yields significant accuracy improvements on benchmarks like NTU-RGB+D 60/120 and UWA3D Multiview while reducing computational cost, with a heavier variant achieving the best performance. The proposed components are validated through extensive ablations and comparisons, underscoring the importance of preserving structural information during pooling and the benefits of input-level augmentation for skeleton graphs.

Abstract

Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matrix of the original feature is used to adaptively adjust the weight of information in different regions of the newly generated features, resulting in more flexible and effective processing. To prevent the irreversible loss of discriminative information, we propose a cross fusion module and an information supplement module to provide block-level and input-level information respectively. As a plug-and-play structure, the proposed operation can be seamlessly combined with existing GCN-based models. We conducted extensive evaluations on several challenging benchmarks, and the experimental results indicate the effectiveness of our proposed solutions. For example, in the cross-subject evaluation of the NTU-RGB+D 60 dataset, IGPN achieves a significant improvement in accuracy compared to the baseline while reducing Flops by nearly 70%; a heavier version has also been introduced to further boost accuracy.

An Improved Graph Pooling Network for Skeleton-Based Action Recognition

TL;DR

This work tackles the challenge of pooling in skeleton-based action recognition by introducing IGPN, a trainable, region-aware structure pooling framework. It combines a region-aware pooling mechanism with correlation-guided weighting, a Cross Fusion Block for multi-granularity feature fusion, and an Information Supplement Module to enrich inputs, all implemented as a plug-and-play enhancement to existing GCN backbones. The approach yields significant accuracy improvements on benchmarks like NTU-RGB+D 60/120 and UWA3D Multiview while reducing computational cost, with a heavier variant achieving the best performance. The proposed components are validated through extensive ablations and comparisons, underscoring the importance of preserving structural information during pooling and the benefits of input-level augmentation for skeleton graphs.

Abstract

Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matrix of the original feature is used to adaptively adjust the weight of information in different regions of the newly generated features, resulting in more flexible and effective processing. To prevent the irreversible loss of discriminative information, we propose a cross fusion module and an information supplement module to provide block-level and input-level information respectively. As a plug-and-play structure, the proposed operation can be seamlessly combined with existing GCN-based models. We conducted extensive evaluations on several challenging benchmarks, and the experimental results indicate the effectiveness of our proposed solutions. For example, in the cross-subject evaluation of the NTU-RGB+D 60 dataset, IGPN achieves a significant improvement in accuracy compared to the baseline while reducing Flops by nearly 70%; a heavier version has also been introduced to further boost accuracy.
Paper Structure (16 sections, 16 equations, 7 figures, 11 tables)

This paper contains 16 sections, 16 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Structural Spatial Pooling. We divide the skeleton according to its spatial structure and assign different attentions to different nodes to achieve adaptive structured spatial pooling. The size of the dot indicates the corresponding intensity of attention.
  • Figure 2: Structure pooling strategy with region awareness. We combine the characteristics of the current feature itself, design an adaptive feature pooling operation, and automatically calculate the relationship matrix of the current feature.
  • Figure 3: The Overall Structure. ISM stands for Information Supplement Module, CFB stands for Cross Fusion Block, and GCN, GAP, and FC correspond to graph convolution operations, global average pooling, and fully connected layers, respectively.
  • Figure 4: The process of spatial pooling.
  • Figure 5: The performance of mainstream algorithms on Cross-Subject of the NTU-RGB+D 60 dataset. (All results are obtained from the joint-stream network.)
  • ...and 2 more figures