An Improved Graph Pooling Network for Skeleton-Based Action Recognition
Cong Wu, Xiao-Jun Wu, Tianyang Xu, Josef Kittler
TL;DR
This work tackles the challenge of pooling in skeleton-based action recognition by introducing IGPN, a trainable, region-aware structure pooling framework. It combines a region-aware pooling mechanism with correlation-guided weighting, a Cross Fusion Block for multi-granularity feature fusion, and an Information Supplement Module to enrich inputs, all implemented as a plug-and-play enhancement to existing GCN backbones. The approach yields significant accuracy improvements on benchmarks like NTU-RGB+D 60/120 and UWA3D Multiview while reducing computational cost, with a heavier variant achieving the best performance. The proposed components are validated through extensive ablations and comparisons, underscoring the importance of preserving structural information during pooling and the benefits of input-level augmentation for skeleton graphs.
Abstract
Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matrix of the original feature is used to adaptively adjust the weight of information in different regions of the newly generated features, resulting in more flexible and effective processing. To prevent the irreversible loss of discriminative information, we propose a cross fusion module and an information supplement module to provide block-level and input-level information respectively. As a plug-and-play structure, the proposed operation can be seamlessly combined with existing GCN-based models. We conducted extensive evaluations on several challenging benchmarks, and the experimental results indicate the effectiveness of our proposed solutions. For example, in the cross-subject evaluation of the NTU-RGB+D 60 dataset, IGPN achieves a significant improvement in accuracy compared to the baseline while reducing Flops by nearly 70%; a heavier version has also been introduced to further boost accuracy.
