Table of Contents
Fetching ...

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

Xianzhou Zeng, Hao Qin, Ming Kong, Luyuan Chen, Qiang Zhu

TL;DR

This work tackles the ill-posed nature of 2D-to-3D pose lifting and the sensitivity to 2D detection errors in 3D HPE by introducing PRPose, a framework that converts lightweight SH-HPE models into multi-hypothesis estimators. PRPose comprises two modules: a SH-HPE backbone and a Weakly Supervised Adaptive Noise Learning (WS-ANL) module that learns per-joint adaptive noise to generate multiple plausible 2D inputs, which are then mapped to multiple 3D poses via the SH-HPE model. By using weak supervision from pseudo-labels based on SH-HPE errors, the method estimates adaptive variances per joint and samples S augmented 2D poses, achieving diverse, realistic hypotheses with substantially higher efficiency than generative MH-HPE approaches. Experiments on Human3.6M and MPI-INF-3DHP demonstrate competitive accuracy with major speedups (over 100× in some configurations) and good generalization to new scenes, highlighting the practical impact of extending lightweight SH-HPE models to the MH-HPE setting. The approach offers a flexible pathway to scalable, real-time multi-hypothesis 3D pose estimation across diverse environments.

Abstract

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

TL;DR

This work tackles the ill-posed nature of 2D-to-3D pose lifting and the sensitivity to 2D detection errors in 3D HPE by introducing PRPose, a framework that converts lightweight SH-HPE models into multi-hypothesis estimators. PRPose comprises two modules: a SH-HPE backbone and a Weakly Supervised Adaptive Noise Learning (WS-ANL) module that learns per-joint adaptive noise to generate multiple plausible 2D inputs, which are then mapped to multiple 3D poses via the SH-HPE model. By using weak supervision from pseudo-labels based on SH-HPE errors, the method estimates adaptive variances per joint and samples S augmented 2D poses, achieving diverse, realistic hypotheses with substantially higher efficiency than generative MH-HPE approaches. Experiments on Human3.6M and MPI-INF-3DHP demonstrate competitive accuracy with major speedups (over 100× in some configurations) and good generalization to new scenes, highlighting the practical impact of extending lightweight SH-HPE models to the MH-HPE setting. The approach offers a flexible pathway to scalable, real-time multi-hypothesis 3D pose estimation across diverse environments.

Abstract

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.
Paper Structure (17 sections, 18 equations, 5 figures, 7 tables)

This paper contains 17 sections, 18 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: (a) The classical SH-HPE architecture produces a unique 3D pose based on a single input 2D pose; (b) The traditional MH-HPE architecture based on the generative model, adding random noise to generate multi-hypothesis results; (c) We propose PRPose that can be extended from any SH-HPE method, generating the input distribution through adaptive noise to reconstruct the original probabilistic modeling process of SH-HPE, and producing multi-hypothesis outputs.
  • Figure 2: Our proposed architecture PRPose consists of two main modules: a) Weakly Supervised Adaptive Noise Learning (WS-ANL): completes weakly supervised training of Adaptive Variance Generation Model; b) Single-Hypothesis Human Pose Estimation (SH-HPE): superimpose adaptive noise to the original 2D pose to generate multiple 2D hypotheses; map a single 2D pose to a 3D pose by an SH-HPE model (such as HTNet) during the inference process.
  • Figure 3: Qualitative comparison of multi-hypothesis outputs for Sample-Joints adapted (SJA) and No adapted (NA) methods on the Human3.6M dataset.
  • Figure 4: Two paradigms of $AVG$ embedded into the PRPose framework: (a) the GCN layer comprising the $AVG$ has an independent weight; (b) $AVG$ shares part of the GCN weight with the SH-HPE model, and only one additional mapping head is needed to obtain the noise estimation.
  • Figure 5: Qualitative comparison of the outputs at different intermediate stages for Sample-Joints adapted (SJA) and No adapted (NA) methods on the Human3.6M dataset. From left to right: (a) the original image, (b) the results of applying a constant variance/variance matrix generated by $AVG$ to each joint, where the diameter of the circle indicates the magnitude of the variance, (c) the 2D samples amplificated by adaptive noise sampling.