Table of Contents
Fetching ...

CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty

Harry Zhang, Luca Carlone

TL;DR

CUPS tackles reliable 3D human pose–shape reconstruction from monocular video by integrating a learned deep uncertainty function with conformal prediction to provide calibrated, probabilistic guarantees. It introduces a transformer-based GLoT architecture to predict SMPL parameters and simultaneously learns $S_\theta(\boldsymbol{X}, \boldsymbol{Y})$, a conformity score, enabling a Deep Uncertainty Conformal Set with threshold $\tau^*$ that accounts for non-exchangeable data. The approach yields state-of-the-art accuracy on multiple datasets and supports multi-hypothesis predictions via Monte Carlo Dropout, while offering two practical bounds on the miscoverage gap to justify uncertainty guarantees. Overall, CUPS advances uncertainty-aware 3D human reconstruction with principled statistical guarantees applicable to safety-critical vision tasks.

Abstract

We introduce CUPS, a novel method for learning sequence-to-sequence 3D human shapes and poses from RGB videos with uncertainty quantification. To improve on top of prior work, we develop a method to generate and score multiple hypotheses during training, effectively integrating uncertainty quantification into the learning process. This process results in a deep uncertainty function that is trained end-to-end with the 3D pose estimator. Post-training, the learned deep uncertainty model is used as the conformity score, which can be used to calibrate a conformal predictor in order to assess the quality of the output prediction. Since the data in human pose-shape learning is not fully exchangeable, we also present two practical bounds for the coverage gap in conformal prediction, developing theoretical backing for the uncertainty bound of our model. Our results indicate that by taking advantage of deep uncertainty with conformal prediction, our method achieves state-of-the-art performance across various metrics and datasets while inheriting the probabilistic guarantees of conformal prediction.

CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty

TL;DR

CUPS tackles reliable 3D human pose–shape reconstruction from monocular video by integrating a learned deep uncertainty function with conformal prediction to provide calibrated, probabilistic guarantees. It introduces a transformer-based GLoT architecture to predict SMPL parameters and simultaneously learns , a conformity score, enabling a Deep Uncertainty Conformal Set with threshold that accounts for non-exchangeable data. The approach yields state-of-the-art accuracy on multiple datasets and supports multi-hypothesis predictions via Monte Carlo Dropout, while offering two practical bounds on the miscoverage gap to justify uncertainty guarantees. Overall, CUPS advances uncertainty-aware 3D human reconstruction with principled statistical guarantees applicable to safety-critical vision tasks.

Abstract

We introduce CUPS, a novel method for learning sequence-to-sequence 3D human shapes and poses from RGB videos with uncertainty quantification. To improve on top of prior work, we develop a method to generate and score multiple hypotheses during training, effectively integrating uncertainty quantification into the learning process. This process results in a deep uncertainty function that is trained end-to-end with the 3D pose estimator. Post-training, the learned deep uncertainty model is used as the conformity score, which can be used to calibrate a conformal predictor in order to assess the quality of the output prediction. Since the data in human pose-shape learning is not fully exchangeable, we also present two practical bounds for the coverage gap in conformal prediction, developing theoretical backing for the uncertainty bound of our model. Our results indicate that by taking advantage of deep uncertainty with conformal prediction, our method achieves state-of-the-art performance across various metrics and datasets while inheriting the probabilistic guarantees of conformal prediction.

Paper Structure

This paper contains 23 sections, 7 theorems, 50 equations, 6 figures, 3 tables.

Key Result

Theorem 1

Under possibly non-exchangeable dataset distribution, the conformal prediction set defined in Definition def:ducs satisfies the following coverage guarantee: where $\Tilde{w}_i$ is the normalized weight obtained via Definition def:weight, $D_{\text{TV}}\left(\cdot \parallel \cdot\right)$ represents the total variation distance, $\boldsymbol{S}_\theta(\boldsymbol{Z}) = [S_\theta(\boldsymbol{Z}_i)]

Figures (6)

  • Figure 1: CUPS sample results obtained on in-the-wild videos collected from TikTok. Given a sequence of 2D RGB frames, CUPS reconstructs a sequence of 3D human meshes, and then a conformal predictor calibrated using a deep uncertainty function ---trained end-to-end with the human pose-shape estimator--- quantifies the uncertainty of the output SMPL parameters.
  • Figure 2: CUPS Overview. CUPS takes as input a sequence of input RGB video frames. The RGB video frames get encoded and fed into a global-local transformer human reconstruction model to produce SMPL parameters representing the human pose and shape in 3D as well as a decoupled global-local embedding. The output of the human reconstructor is supervised via SMPL loss. While training, we also learn a deep uncertainty function that learns to rank the uncertainty of the produced output sequence. Then after training, this deep uncertainty function is used as the conformity score for constructing a conformal set for conformal prediction.
  • Figure 3: Comparison of nr. of samples proposed during training time ensemble.
  • Figure 4: Conformity scores choices on 3DPW (bottom) and internet videos (top).
  • Figure 5: Comparison of strength of uncertainty loss in the total training loss.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Definition 1: Deep Uncertainty Function
  • Definition 2: DUCS
  • Definition 3: Feature Distance Weight
  • Theorem 1: Nonexchangeable Coverage Barber23aos-conformal
  • Theorem 2: Miscoverage under Periodic Change Barber23aos-conformal
  • Theorem 3: Miscoverage under Beta Distribution
  • Definition 4: Exchangeability in Probabilistic Distribution
  • Definition 4: DUCS
  • Lemma 4: Weight sum upper bound Harrison12conservative
  • Theorem 4: Nonexchangeable Coverage Barber23aos-conformal
  • ...and 5 more