Flexible graph convolutional network for 3D human pose estimation

Abu Taib Mohammed Shahjahan; A. Ben Hamza

Flexible graph convolutional network for 3D human pose estimation

Abu Taib Mohammed Shahjahan, A. Ben Hamza

TL;DR

3D human pose estimation suffers from depth ambiguity and occlusion when using traditional GCNs limited to one-hop neighbors. Flex-GCN introduces a flexible graph convolution that aggregates 1- and 2-hop information via a propagation operator $P = ((1-s)\mathbf{I} + s \hat{\mathbf{A}}) \hat{\mathbf{A}} = (1-s)\hat{\mathbf{A}} + s \hat{\mathbf{A}}^2$, augmented with an initial residual path and learnable adjacency modulation $\check{\mathbf{A}} = \hat{\mathbf{A}} + \mathbf{Q}$, all within a ConvNeXt-inspired residual architecture and a Global Response Normalization layer. The model maintains the same time and memory complexity as standard GCNs while enabling richer, globally informed representations, and achieves competitive results on Human3.6M and MPI-INF-3DHP, with ablations confirming the positive impact of the residual connection and symmetric modulation. These findings suggest a scalable approach for robust 3D pose estimation under occlusion and across datasets, with potential applicability to other graph-based vision tasks. $\mathcal{L}$ combines $L_2$ and $L_1$ penalties to supervise 3D pose predictions, and the method benefits from explicit multi-hop information propagation and learned long-range skeletal relationships.

Abstract

Although graph convolutional networks exhibit promising performance in 3D human pose estimation, their reliance on one-hop neighbors limits their ability to capture high-order dependencies among body joints, crucial for mitigating uncertainty arising from occlusion or depth ambiguity. To tackle this limitation, we introduce Flex-GCN, a flexible graph convolutional network designed to learn graph representations that capture broader global information and dependencies. At its core is the flexible graph convolution, which aggregates features from both immediate and second-order neighbors of each node, while maintaining the same time and memory complexity as the standard convolution. Our network architecture comprises residual blocks of flexible graph convolutional layers, as well as a global response normalization layer for global feature aggregation, normalization and calibration. Quantitative and qualitative results demonstrate the effectiveness of our model, achieving competitive performance on benchmark datasets.

Flexible graph convolutional network for 3D human pose estimation

TL;DR

, augmented with an initial residual path and learnable adjacency modulation

, all within a ConvNeXt-inspired residual architecture and a Global Response Normalization layer. The model maintains the same time and memory complexity as standard GCNs while enabling richer, globally informed representations, and achieves competitive results on Human3.6M and MPI-INF-3DHP, with ablations confirming the positive impact of the residual connection and symmetric modulation. These findings suggest a scalable approach for robust 3D pose estimation under occlusion and across datasets, with potential applicability to other graph-based vision tasks.

combines

and

penalties to supervise 3D pose predictions, and the method benefits from explicit multi-hop information propagation and learned long-range skeletal relationships.

Abstract

Paper Structure (10 sections, 2 theorems, 6 equations, 3 figures, 5 tables)

This paper contains 10 sections, 2 theorems, 6 equations, 3 figures, 5 tables.

Introduction
Related Work
Method
Preliminaries and Problem Statement
Flexible Graph Convolutional Network
Experiments
Experimental Setup
Results and Analysis
Ablation Study
Conclusion

Key Result

Lemma 1

If two matrices $\bm{M}_{1}$ and $\bm{M}_{2}$ commute, i.e., $\bm{M}_{1}\bm{M}_{2}=\bm{M}_{2}\bm{M}_{1}$, then where $\rho(\cdot)$ denotes matrix spectral radius (i.e., largest absolute value of all eigenvalues).

Figures (3)

Figure 1: Network architecture of Flex-GCN for 3D human pose estimation.
Figure 2: Visual comparison between Flex-GCN and Modulated GCN on sample actions from the Human3.6M dataset.
Figure 3: Performance of our proposed Flex-GCN model on the Human3.6M dataset using varying batch and filter sizes.

Theorems & Definitions (2)

Lemma 1
Proposition 1

Flexible graph convolutional network for 3D human pose estimation

TL;DR

Abstract

Flexible graph convolutional network for 3D human pose estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)