Table of Contents
Fetching ...

Score-Guided Diffusion for 3D Human Recovery

Anastasis Stathopoulos, Ligong Han, Dimitris Metaxas

TL;DR

By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model.

Abstract

We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings. We make our code and models available at the https://statho.github.io/ScoreHMR.

Score-Guided Diffusion for 3D Human Recovery

TL;DR

By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model.

Abstract

We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings. We make our code and models available at the https://statho.github.io/ScoreHMR.
Paper Structure (21 sections, 14 equations, 8 figures, 5 tables)

This paper contains 21 sections, 14 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Although achieving remarkable 3D human reconstructions, a recent state-of-the-art monocular regression approach goel2023humans may encounter challenges in aligning the human body model to the image (middle image). To address this, we propose an iterative refinement approach that utilizes image observations (e.g., 2D keypoint detections) and achieves better image-model alignment (right image).
  • Figure 2: Score-Guided Human Mesh Recovery and its applications. Top row: Overview of ScoreHMR, which iteratively refines an initial regression estimate in a DDIM inversion -- DDIM guided sampling loop until the human body model aligns with the available observation. Bottom row: Applications. (a): Body model fitting to 2D keypoints. (b): Multi-view refinement of individual per-frame predictions with cross-view consistency guidance. (c): Recovering temporally consistent and smooth 3D human motion from a video sequence given initial per-frame estimates.
  • Figure 3: Qualitative evaluation of ScoreHMR Pink: Regression with ProHMR kolotouros2021probabilistic. White: Regression with HMR 2.0 goel2023humans. Green: Regression + ScoreHMR (ours).
  • Figure 4: Body model fitting results. Pink: Regression (ProHMR kolotouros2021probabilistic). White: Regression (HMR 2.0 goel2023humans). Green: Regression + ScoreHMR (ours). Blue: Regression + ProHMR-fitting kolotouros2021probabilistic. Grey: Regression + SMPLify bogo2016keep.
  • Figure 5: Diffusion model architecture. Implementation of ${\mathbf{\epsilon}_{\mathbf{\phi}}}({\mathbf{x}_{t}}, t, {\mathbf{c}} = g(I))$. $LN$ denotes Layer Normalization ba2016layer, $\parallel$ denotes concatenation, and $d$ denotes the dimension of the image features ${\mathbf{c}}$. Rotations are parameterized with 6D representations, thus ${\mathbf{x}_{0}}, {\mathbf{x}_{t}}, \tilde{\mathbf{\epsilon}}$ are 144-D vectors.
  • ...and 3 more figures