Table of Contents
Fetching ...

KBody: Towards general, robust, and aligned monocular whole-body estimation

Nikolaos Zioulis, James F. O'Brien

TL;DR

KBody tackles robust monocular whole-body estimation by marrying data-driven priors with an optimization-based fitting stage. It introduces a two-stage pipeline that optionally completes partial images using a StyleGAN-Human prior, then fits a parametric body model with a disentangled optimization loop, augmented by virtual joints and an asymmetric distance-field silhouette objective. Key contributions include the virtual joints for improved keypoint correspondence, the disentangled body optimization to balance pose and shape, and the asymmetric distance field that robustly guides silhouette alignment. The approach yields improved pixel alignment and competitive pose/shape accuracy on challenging in-the-wild data, while highlighting trade-offs between speed and accuracy compared to single-shot estimators. This work advances practical monocular body fitting by enabling robust, partially-observed, and metrically coherent estimates that can support downstream applications like avatar creation and virtual try-on.

Abstract

KBody is a method for fitting a low-dimensional body model to an image. It follows a predict-and-optimize approach, relying on data-driven model estimates for the constraints that will be used to solve for the body's parameters. Acknowledging the importance of high quality correspondences, it leverages ``virtual joints" to improve fitting performance, disentangles the optimization between the pose and shape parameters, and integrates asymmetric distance fields to strike a balance in terms of pose and shape capturing capacity, as well as pixel alignment. We also show that generative model inversion offers a strong appearance prior that can be used to complete partial human images and used as a building block for generalized and robust monocular body fitting. Project page: https://zokin.github.io/KBody.

KBody: Towards general, robust, and aligned monocular whole-body estimation

TL;DR

KBody tackles robust monocular whole-body estimation by marrying data-driven priors with an optimization-based fitting stage. It introduces a two-stage pipeline that optionally completes partial images using a StyleGAN-Human prior, then fits a parametric body model with a disentangled optimization loop, augmented by virtual joints and an asymmetric distance-field silhouette objective. Key contributions include the virtual joints for improved keypoint correspondence, the disentangled body optimization to balance pose and shape, and the asymmetric distance field that robustly guides silhouette alignment. The approach yields improved pixel alignment and competitive pose/shape accuracy on challenging in-the-wild data, while highlighting trade-offs between speed and accuracy compared to single-shot estimators. This work advances practical monocular body fitting by enabling robust, partially-observed, and metrically coherent estimates that can support downstream applications like avatar creation and virtual try-on.

Abstract

KBody is a method for fitting a low-dimensional body model to an image. It follows a predict-and-optimize approach, relying on data-driven model estimates for the constraints that will be used to solve for the body's parameters. Acknowledging the importance of high quality correspondences, it leverages ``virtual joints" to improve fitting performance, disentangles the optimization between the pose and shape parameters, and integrates asymmetric distance fields to strike a balance in terms of pose and shape capturing capacity, as well as pixel alignment. We also show that generative model inversion offers a strong appearance prior that can be used to complete partial human images and used as a building block for generalized and robust monocular body fitting. Project page: https://zokin.github.io/KBody.
Paper Structure (17 sections, 6 equations, 58 figures, 3 tables)

This paper contains 17 sections, 6 equations, 58 figures, 3 tables.

Figures (58)

  • Figure 1: Flexible, pixel aligned, accurate body pose and shape capture is the challenging, yet ultimate goal of monocular expressive body fitting. KBody is a general approach that improves the balance between all 3 traits using a predict-and-optimize approach while also gracefully handling partial images.
  • Figure 2: The KBody framework considers 2 stages, an optional image-based body completion on the left, and a general body fitting on the right. Keypoints $\mathbf{k}$, silhouette $\mathbf{S}$ and (optionally) camera $\mathbf{c}$ constraints are predicted from the respective models $\mathcal{K}$, $\mathcal{S}$ and $\mathcal{C}$. Then, an initial state $\boldsymbol{\beta}, \boldsymbol{\theta}, \mathbf{T}$ predicted by $\mathcal{P}$ is iteratively optimized to fit these constraints using the rendering $\mathcal{R}$, virtual joint $\mathcal{V}$, and camera-conditioned projection $\pi$ functions. When identifying partial keypoints $\mathbf{k}$, the optional step on the left produces extrapolated keypoints $\mathbf{k}_{ex}$ to improve fits on partial images. After properly aligning the masked image $\mathbf{I}^w \odot \mathbf{S}^w$ using $\mathbf{k}$ and the distribution $\bar{\mathbf{k}}_t$ expected by the generative model, an initial inversion vector $\mathbf{w}$, estimated by a single-shot inversion model $e4e$, is iteratively refined twice, first on the $\mathcal{W}$ latent space and then on the manifold $\mathcal{G}_{\phi}$ using the warped masked partial image as constraint.
  • Figure 3: From left to right: i) the SMPL-X body surface and joints, ii) the inset torso with the barycentric parameterization comprising the triangles formed by raw and manually pickedkolotouros2019learning joints, iii) our best-estimated virtual joints, and their comparison with iv)manually picked openpose joints bhatnagar2020combiningbhatnagar2020loopreg and v) the learned regressor joints fit to Human3.6M hedlin2022simple. As illustrated, the virtual joints can extrapolate to exterior triangle locations by using negative barycentric weights.
  • Figure 4: Left-to-right: SMPLify-X pavlakos2019expressive (light green), PyMAF-X pymafx2022 (purple), SHAPY choutas2022accurate (green) and KBody (pink).
  • Figure 5: Partial image qualitative results. Same scheme as \ref{['fig:full']}.
  • ...and 53 more figures