Table of Contents
Fetching ...

3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information

Sihan Wen, Xiantan Zhu, Zhiming Tan

TL;DR

The paper introduces a Semantic Graph Attention Network (SemGAN) for 3D whole-body pose estimation, fusing global context from self-attention with local skeletal priors via SemGCN. It adds a Body Part Decoder to better exploit dense joints in body, face, and hands, and incorporates Distance Information to enhance spatial reasoning, augmented by a Geometry Loss consisting of Normal and Bone losses to enforce anatomically plausible poses. Evaluations on the Human3.6M 3D WholeBody (H3WB) dataset show state-of-the-art performance, with the method achieving first place and a substantial improvement (≈15.44 mm MPJPE) over the second-best approach. Together, these components demonstrate that combining global-local representations with geometry-aware constraints yields significant gains for accurate 3D whole-body pose estimation, with strong potential for applications in AR/VR, HCI, and behavioral analysis.

Abstract

In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global context, while also utilizing the graph convolutions to handle the local connectivity and structural constraints of the skeleton. We also design a Body Part Decoder that assists in extracting and refining the information related to specific segments of the body. Furthermore, our approach incorporates Distance Information, enhancing our model's capability to comprehend and accurately predict spatial relationships. Finally, we introduce a Geometry Loss who makes a critical constraint on the structural skeleton of the body, ensuring that the model's predictions adhere to the natural limits of human posture. The experimental results validate the effectiveness of our approach, demonstrating that every element within the system is essential for improving pose estimation outcomes. With comparison to state-of-the-art, the proposed work not only meets but exceeds the existing benchmarks.

3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information

TL;DR

The paper introduces a Semantic Graph Attention Network (SemGAN) for 3D whole-body pose estimation, fusing global context from self-attention with local skeletal priors via SemGCN. It adds a Body Part Decoder to better exploit dense joints in body, face, and hands, and incorporates Distance Information to enhance spatial reasoning, augmented by a Geometry Loss consisting of Normal and Bone losses to enforce anatomically plausible poses. Evaluations on the Human3.6M 3D WholeBody (H3WB) dataset show state-of-the-art performance, with the method achieving first place and a substantial improvement (≈15.44 mm MPJPE) over the second-best approach. Together, these components demonstrate that combining global-local representations with geometry-aware constraints yields significant gains for accurate 3D whole-body pose estimation, with strong potential for applications in AR/VR, HCI, and behavioral analysis.

Abstract

In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global context, while also utilizing the graph convolutions to handle the local connectivity and structural constraints of the skeleton. We also design a Body Part Decoder that assists in extracting and refining the information related to specific segments of the body. Furthermore, our approach incorporates Distance Information, enhancing our model's capability to comprehend and accurately predict spatial relationships. Finally, we introduce a Geometry Loss who makes a critical constraint on the structural skeleton of the body, ensuring that the model's predictions adhere to the natural limits of human posture. The experimental results validate the effectiveness of our approach, demonstrating that every element within the system is essential for improving pose estimation outcomes. With comparison to state-of-the-art, the proposed work not only meets but exceeds the existing benchmarks.
Paper Structure (15 sections, 4 equations, 3 figures, 2 tables)

This paper contains 15 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of proposed method.
  • Figure 2: Semantic Graph Attention Encoder.
  • Figure 3: Body Part Decoder. B stands for batch size, 133 is the number joints of whole-body, 256 is the dims of feature, 23 is number joints of body, 68 is number joints of face, 42 is number joints of hands.