Table of Contents
Fetching ...

PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation

Soubarna Banik, Edvard Avagyan, Sayantan Auddy, Alejandro Mendoza Gracia, Alois Knoll

TL;DR

PoseGraphNet++ tackles the limitation of skeleton-based 3D HPE by predicting both joint positions and bone orientations from 2D poses. It introduces a node-edge graph convolutional network with adaptive adjacency and neighbor-group kernels, using a 6D rotation representation to yield stable bone orientation estimates. The approach achieves near state-of-the-art performance on Human3.6M for both position and orientation, and shows strong generalization to MPI-3DHP and MPI-3DPW, with ablations confirming the benefits of modeling joint-bone relationships. This work enables holistic 3D pose understanding without relying on parametric body models, with potential impact on rehabilitation, action recognition, and real-time animation.

Abstract

Existing skeleton-based 3D human pose estimation methods only predict joint positions. Although the yaw and pitch of bone rotations can be derived from joint positions, the roll around the bone axis remains unresolved. We present PoseGraphNet++ (PGN++), a novel 2D-to-3D lifting Graph Convolution Network that predicts the complete human pose in 3D including joint positions and bone orientations. We employ both node and edge convolutions to utilize the joint and bone features. Our model is evaluated on multiple datasets using both position and rotation metrics. PGN++ performs on par with the state-of-the-art (SoA) on the Human3.6M benchmark. In generalization experiments, it achieves the best results in position and matches the SoA in orientation, showcasing a more balanced performance than the current SoA. PGN++ exploits the mutual relationship of joints and bones resulting in significantly \SB{improved} position predictions, as shown by our ablation results.

PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation

TL;DR

PoseGraphNet++ tackles the limitation of skeleton-based 3D HPE by predicting both joint positions and bone orientations from 2D poses. It introduces a node-edge graph convolutional network with adaptive adjacency and neighbor-group kernels, using a 6D rotation representation to yield stable bone orientation estimates. The approach achieves near state-of-the-art performance on Human3.6M for both position and orientation, and shows strong generalization to MPI-3DHP and MPI-3DPW, with ablations confirming the benefits of modeling joint-bone relationships. This work enables holistic 3D pose understanding without relying on parametric body models, with potential impact on rehabilitation, action recognition, and real-time animation.

Abstract

Existing skeleton-based 3D human pose estimation methods only predict joint positions. Although the yaw and pitch of bone rotations can be derived from joint positions, the roll around the bone axis remains unresolved. We present PoseGraphNet++ (PGN++), a novel 2D-to-3D lifting Graph Convolution Network that predicts the complete human pose in 3D including joint positions and bone orientations. We employ both node and edge convolutions to utilize the joint and bone features. Our model is evaluated on multiple datasets using both position and rotation metrics. PGN++ performs on par with the state-of-the-art (SoA) on the Human3.6M benchmark. In generalization experiments, it achieves the best results in position and matches the SoA in orientation, showcasing a more balanced performance than the current SoA. PGN++ exploits the mutual relationship of joints and bones resulting in significantly \SB{improved} position predictions, as shown by our ablation results.
Paper Structure (19 sections, 6 equations, 3 figures, 4 tables)

This paper contains 19 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: PoseGraphNet++ architecture. (Left) Overall structure of the network. The residual block shown with a dotted box is repeated thrice. (Right) Expanded view of the proposed Node-Edge module, showing the node and edge convolution layers.
  • Figure 2: Graph representation of the human body: (a) Node definitions with examples of neighbor groups of two nodes (dotted circles). Blue, pink and green show the self, parent and child neighbors respectively. (b) Edge definitions and examples of neighbor groups of two edges (dotted ellipses). Blue, pink, green and maroon highlight the self, parent, child and junction neighbors respectively.
  • Figure 3: Qualitative results on H36M test set for two actions with noisy CPN inputs. Each plot shows the ground truth skeleton on the left and the predicted one on the right. (a,c) and (b,d) show the best and failure cases respectively.