Table of Contents
Fetching ...

Adaptive graph Kolmogorov-Arnold network for 3D human pose estimation

Abu Taib Mohammed Shahjahan, A. Ben Hamza

TL;DR

PoseKAN introduces an adaptive graph Kolmogorov-Arnold Network for 3D human pose estimation that replaces fixed-node activations with learnable edge-wise functions and leverages multi-hop propagation to capture long-range skeletal dependencies. By integrating a spectral modulation filter and a propagation scheme with $\mathbf{P}=(1-s)\hat{\mathbf{A}}+s\hat{\mathbf{A}}^{2}$, PoseKAN mitigates spectral bias and enhances expressiveness in 2D-to-3D lifting from a single image, using residual blocks and global response normalization. The model is trained with a combined $L_2$/$L_1$ loss and demonstrates competitive state-of-the-art performance on Human3.6M and strong generalization on MPI-INF-3DHP, while maintaining a compact parameter budget (~$5.72$M). These results indicate improved robustness to occlusions and depth ambiguities, with potential for extension to multi-person pose estimation and broader graph-based tasks. $L = \frac{1}{N} \Bigl[(1-\alpha) \sum_{i=1}^{N} \| \mathbf{y}_i - \hat{\mathbf{y}}_i \|_2^2 + \alpha \sum_{i=1}^{N} \| \mathbf{y}_i - \hat{\mathbf{y}}_i \|_1 \Bigr]$ demonstrates the elastic-net-inspired training objective.

Abstract

Graph convolutional network (GCN)-based methods have shown strong performance in 3D human pose estimation by leveraging the natural graph structure of the human skeleton. However, their local receptive field limits their ability to capture long-range dependencies essential for handling occlusions and depth ambiguities. They also exhibit spectral bias, which prioritizes low-frequency components while struggling to model high-frequency details. In this paper, we introduce PoseKAN, an adaptive graph Kolmogorov-Arnold Network (KAN), framework that extends KANs to graph-based learning for 2D-to-3D pose lifting from a single image. Unlike GCNs that use fixed activation functions, KANs employ learnable functions on graph edges, allowing data-driven, adaptive feature transformations. This enhances the model's adaptability and expressiveness, making it more expressive in learning complex pose variations. Our model employs multi-hop feature aggregation, ensuring the body joints can leverage information from both local and distant neighbors, leading to improved spatial awareness. It also incorporates residual PoseKAN blocks for deeper feature refinement, and a global response normalization for improved feature selectivity and contrast. Extensive experiments on benchmark datasets demonstrate the competitive performance of our model against state-of-the-art methods.

Adaptive graph Kolmogorov-Arnold network for 3D human pose estimation

TL;DR

PoseKAN introduces an adaptive graph Kolmogorov-Arnold Network for 3D human pose estimation that replaces fixed-node activations with learnable edge-wise functions and leverages multi-hop propagation to capture long-range skeletal dependencies. By integrating a spectral modulation filter and a propagation scheme with , PoseKAN mitigates spectral bias and enhances expressiveness in 2D-to-3D lifting from a single image, using residual blocks and global response normalization. The model is trained with a combined / loss and demonstrates competitive state-of-the-art performance on Human3.6M and strong generalization on MPI-INF-3DHP, while maintaining a compact parameter budget (~M). These results indicate improved robustness to occlusions and depth ambiguities, with potential for extension to multi-person pose estimation and broader graph-based tasks. demonstrates the elastic-net-inspired training objective.

Abstract

Graph convolutional network (GCN)-based methods have shown strong performance in 3D human pose estimation by leveraging the natural graph structure of the human skeleton. However, their local receptive field limits their ability to capture long-range dependencies essential for handling occlusions and depth ambiguities. They also exhibit spectral bias, which prioritizes low-frequency components while struggling to model high-frequency details. In this paper, we introduce PoseKAN, an adaptive graph Kolmogorov-Arnold Network (KAN), framework that extends KANs to graph-based learning for 2D-to-3D pose lifting from a single image. Unlike GCNs that use fixed activation functions, KANs employ learnable functions on graph edges, allowing data-driven, adaptive feature transformations. This enhances the model's adaptability and expressiveness, making it more expressive in learning complex pose variations. Our model employs multi-hop feature aggregation, ensuring the body joints can leverage information from both local and distant neighbors, leading to improved spatial awareness. It also incorporates residual PoseKAN blocks for deeper feature refinement, and a global response normalization for improved feature selectivity and contrast. Extensive experiments on benchmark datasets demonstrate the competitive performance of our model against state-of-the-art methods.

Paper Structure

This paper contains 15 sections, 13 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of Model Architecture. The model takes 2D pose coordinates as input and produces 3D pose predictions as output, where $J$ is the number of joints and $F$ is the embedding dimension. The architecture consists of a start PoseKAN layer, four residual FG-GCN blocks, and an end PoseKAN layer. A global response normalization is also used.
  • Figure 2: Visual comparison between PoseKAN and GraphMLP on sample actions from the Human3.6M dataset.
  • Figure 3: Effect of spline order and grid size.
  • Figure 4: Effect of scaling factor.
  • Figure 5: Illustration of a two-layer KAN architecture.
  • ...and 2 more figures