Table of Contents
Fetching ...

A Simple Strategy for Body Estimation from Partial-View Images

Yafei Mao, Xuelu Li, Brandon Smith, Jinjin Li, Raja Bala

TL;DR

The paper tackles the ambiguity in estimating body measurements from partial-view RGB images caused by unknown capture distance and body size. It introduces a simple height normalization technique that relocates the subject's skeleton to a designated image location, thereby normalizing scale and decoupling distance from size, and demonstrates its integration into single-view HMR models as well as a two-view BMN pipeline. Empirical results on a real dataset of 6,100 training subjects and 400 test subjects across 59 measurements show TP90 error reductions up to 2 inches, with consistent improvements across body parts and BMI categories. This preprocessing step improves robustness to partial visibility and supports more accurate virtual try-on and body personalization with minimal modification to existing architectures.

Abstract

Virtual try-on and product personalization have become increasingly important in modern online shopping, highlighting the need for accurate body measurement estimation. Although previous research has advanced in estimating 3D body shapes from RGB images, the task is inherently ambiguous as the observed scale of human subjects in the images depends on two unknown factors: capture distance and body dimensions. This ambiguity is particularly pronounced in partial-view scenarios. To address this challenge, we propose a modular and simple height normalization solution. This solution relocates the subject skeleton to the desired position, thereby normalizing the scale and disentangling the relationship between the two variables. Our experimental results demonstrate that integrating this technique into state-of-the-art human mesh reconstruction models significantly enhances partial body measurement estimation. Additionally, we illustrate the applicability of this approach to multi-view settings, showcasing its versatility.

A Simple Strategy for Body Estimation from Partial-View Images

TL;DR

The paper tackles the ambiguity in estimating body measurements from partial-view RGB images caused by unknown capture distance and body size. It introduces a simple height normalization technique that relocates the subject's skeleton to a designated image location, thereby normalizing scale and decoupling distance from size, and demonstrates its integration into single-view HMR models as well as a two-view BMN pipeline. Empirical results on a real dataset of 6,100 training subjects and 400 test subjects across 59 measurements show TP90 error reductions up to 2 inches, with consistent improvements across body parts and BMI categories. This preprocessing step improves robustness to partial visibility and supports more accurate virtual try-on and body personalization with minimal modification to existing architectures.

Abstract

Virtual try-on and product personalization have become increasingly important in modern online shopping, highlighting the need for accurate body measurement estimation. Although previous research has advanced in estimating 3D body shapes from RGB images, the task is inherently ambiguous as the observed scale of human subjects in the images depends on two unknown factors: capture distance and body dimensions. This ambiguity is particularly pronounced in partial-view scenarios. To address this challenge, we propose a modular and simple height normalization solution. This solution relocates the subject skeleton to the desired position, thereby normalizing the scale and disentangling the relationship between the two variables. Our experimental results demonstrate that integrating this technique into state-of-the-art human mesh reconstruction models significantly enhances partial body measurement estimation. Additionally, we illustrate the applicability of this approach to multi-view settings, showcasing its versatility.
Paper Structure (11 sections, 4 figures, 3 tables)

This paper contains 11 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: From left to right, the capture distance decreases from 8 to 3 feet. The height normalized versions are shown in the bottom row, where effective capture distance and scale are equalized.
  • Figure 2: Network architecture of BMN model with height normalization. Segmentation masks and body landmarks are first predicted on each view using the respective models Seg and KP. Then, this information and the subject's height $H$ and gender $G$ data are passed to the height normalization module to transform the image. Finally, the two views are concatenated together and fed into the network to predict measurements $M$, mesh vertices $V$, and SMPL shape parameters $\beta$.
  • Figure 3: Height normalization involves transforming the skeleton of the person from its detected location (connected by yellow lines) to the designated location (connected by the red lines) through affine transformations.
  • Figure 4: Qualitative comparison of the 3D body mesh predicted by BMN model trained (a) without and (b) with height normalization. (c) shows the SMPL-aligned groundtruth scan of the subject.