Table of Contents
Fetching ...

BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment

Chih-Hsiang Hsu, Jyh-Shing Roger Jang

TL;DR

This work introduces a recurrent neural network architecture designed to capture holistic information across entire video sequences, enabling accurate prediction of bone lengths and presents a bone length adjustment method that preserves bone orientations while substituting bone lengths with predicted values.

Abstract

Current approaches in 3D human pose estimation primarily focus on regressing 3D joint locations, often neglecting critical physical constraints such as bone length consistency and body symmetry. This work introduces a recurrent neural network architecture designed to capture holistic information across entire video sequences, enabling accurate prediction of bone lengths. To enhance training effectiveness, we propose a novel augmentation strategy using synthetic bone lengths that adhere to physical constraints. Moreover, we present a bone length adjustment method that preserves bone orientations while substituting bone lengths with predicted values. Our results demonstrate that existing 3D human pose estimation models can be significantly enhanced through this adjustment process. Furthermore, we fine-tune human pose estimation models using inferred bone lengths, observing notable improvements. Our bone length prediction model surpasses the previous best results, and our adjustment and fine-tuning method enhance performance across several metrics on the Human3.6M dataset.

BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment

TL;DR

This work introduces a recurrent neural network architecture designed to capture holistic information across entire video sequences, enabling accurate prediction of bone lengths and presents a bone length adjustment method that preserves bone orientations while substituting bone lengths with predicted values.

Abstract

Current approaches in 3D human pose estimation primarily focus on regressing 3D joint locations, often neglecting critical physical constraints such as bone length consistency and body symmetry. This work introduces a recurrent neural network architecture designed to capture holistic information across entire video sequences, enabling accurate prediction of bone lengths. To enhance training effectiveness, we propose a novel augmentation strategy using synthetic bone lengths that adhere to physical constraints. Moreover, we present a bone length adjustment method that preserves bone orientations while substituting bone lengths with predicted values. Our results demonstrate that existing 3D human pose estimation models can be significantly enhanced through this adjustment process. Furthermore, we fine-tune human pose estimation models using inferred bone lengths, observing notable improvements. Our bone length prediction model surpasses the previous best results, and our adjustment and fine-tuning method enhance performance across several metrics on the Human3.6M dataset.

Paper Structure

This paper contains 19 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) The representation of a human pose with joint labels. (b) The overview of bone length replacement, which involves decomposing the pose into bone directions and bone lengths, and then substituting the original bone lengths with new ones.
  • Figure 2: This error bar plot shows the means and the standard deviations of bone lengths in the Human3.6M dataset. Each mean value is represented by a dot, and the associated standard deviation is shown by the bars, indicating the variability around the mean bone lengths.
  • Figure 3: The structures of our bone length prediction models. The input length is 3 for illustration.
  • Figure 4: The overview of bone length adjustment. The 3D pose estimation is based on existing 2D-to-3D lifting models. The blue part is based on existing lifting models. Only the parameters in blue part are fine-tuned.
  • Figure 5: The average bone length error comparison across all frames of the test set in Human3.6M. ($*$) including test data statistics.
  • ...and 1 more figures