Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

Xianzhou Zeng; Hao Qin; Ming Kong; Luyuan Chen; Qiang Zhu

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

Xianzhou Zeng, Hao Qin, Ming Kong, Luyuan Chen, Qiang Zhu

TL;DR

This work tackles the ill-posed nature of 2D-to-3D pose lifting and the sensitivity to 2D detection errors in 3D HPE by introducing PRPose, a framework that converts lightweight SH-HPE models into multi-hypothesis estimators. PRPose comprises two modules: a SH-HPE backbone and a Weakly Supervised Adaptive Noise Learning (WS-ANL) module that learns per-joint adaptive noise to generate multiple plausible 2D inputs, which are then mapped to multiple 3D poses via the SH-HPE model. By using weak supervision from pseudo-labels based on SH-HPE errors, the method estimates adaptive variances per joint and samples S augmented 2D poses, achieving diverse, realistic hypotheses with substantially higher efficiency than generative MH-HPE approaches. Experiments on Human3.6M and MPI-INF-3DHP demonstrate competitive accuracy with major speedups (over 100× in some configurations) and good generalization to new scenes, highlighting the practical impact of extending lightweight SH-HPE models to the MH-HPE setting. The approach offers a flexible pathway to scalable, real-time multi-hypothesis 3D pose estimation across diverse environments.

Abstract

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

TL;DR

Abstract

Paper Structure (17 sections, 18 equations, 5 figures, 7 tables)

This paper contains 17 sections, 18 equations, 5 figures, 7 tables.

Introduction
Related Work
SH-HPE
MH-HPE
Method
Overview
SH-HPE Module
WS-ANL Module
Experiments
Datasets and Evaluation Metrics
Implementation Details
Quantitative Results
Qualitative Results
Conclusion
Probability modeling derivation of PRPose
...and 2 more sections

Figures (5)

Figure 1: (a) The classical SH-HPE architecture produces a unique 3D pose based on a single input 2D pose; (b) The traditional MH-HPE architecture based on the generative model, adding random noise to generate multi-hypothesis results; (c) We propose PRPose that can be extended from any SH-HPE method, generating the input distribution through adaptive noise to reconstruct the original probabilistic modeling process of SH-HPE, and producing multi-hypothesis outputs.
Figure 2: Our proposed architecture PRPose consists of two main modules: a) Weakly Supervised Adaptive Noise Learning (WS-ANL): completes weakly supervised training of Adaptive Variance Generation Model; b) Single-Hypothesis Human Pose Estimation (SH-HPE): superimpose adaptive noise to the original 2D pose to generate multiple 2D hypotheses; map a single 2D pose to a 3D pose by an SH-HPE model (such as HTNet) during the inference process.
Figure 3: Qualitative comparison of multi-hypothesis outputs for Sample-Joints adapted (SJA) and No adapted (NA) methods on the Human3.6M dataset.
Figure 4: Two paradigms of $AVG$ embedded into the PRPose framework: (a) the GCN layer comprising the $AVG$ has an independent weight; (b) $AVG$ shares part of the GCN weight with the SH-HPE model, and only one additional mapping head is needed to obtain the noise estimation.
Figure 5: Qualitative comparison of the outputs at different intermediate stages for Sample-Joints adapted (SJA) and No adapted (NA) methods on the Human3.6M dataset. From left to right: (a) the original image, (b) the results of applying a constant variance/variance matrix generated by $AVG$ to each joint, where the diameter of the circle indicates the magnitude of the variance, (c) the 2D samples amplificated by adaptive noise sampling.

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

TL;DR

Abstract

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)