Table of Contents
Fetching ...

ClothHMR: 3D Mesh Recovery of Humans in Diverse Clothing from Single Image

Yunqi Gao, Leyuan Liu, Yuhan Li, Changxin Gao, Yuanyuan Liu, Jingying Chen

TL;DR

ClothHMR tackles 3D human mesh recovery under diverse clothing by introducing a two-module approach: clothing tailoring (CT), which fits garments to the body silhouette using body semantics and edge cues, and FHVM-based mesh recovering (MR), which leverages a foundational human vision model to produce high-fidelity intermediate representations (joints, depth, silhouette) and iteratively refine SMPL parameters. The method demonstrates significant improvements over state-of-the-art on Cloth4D, THuman2.0, EMDB, and 3DPW, and its efficacy is validated through ablations and a web-based virtual try-on application. The work highlights the value of integrating clothing-aware preprocessing with a strong, unified foundational model to enhance robustness to loose clothing and complex poses in 3D human reconstruction.

Abstract

With 3D data rapidly emerging as an important form of multimedia information, 3D human mesh recovery technology has also advanced accordingly. However, current methods mainly focus on handling humans wearing tight clothing and perform poorly when estimating body shapes and poses under diverse clothing, especially loose garments. To this end, we make two key insights: (1) tailoring clothing to fit the human body can mitigate the adverse impact of clothing on 3D human mesh recovery, and (2) utilizing human visual information from large foundational models can enhance the generalization ability of the estimation. Based on these insights, we propose ClothHMR, to accurately recover 3D meshes of humans in diverse clothing. ClothHMR primarily consists of two modules: clothing tailoring (CT) and FHVM-based mesh recovering (MR). The CT module employs body semantic estimation and body edge prediction to tailor the clothing, ensuring it fits the body silhouette. The MR module optimizes the initial parameters of the 3D human mesh by continuously aligning the intermediate representations of the 3D mesh with those inferred from the foundational human visual model (FHVM). ClothHMR can accurately recover 3D meshes of humans wearing diverse clothing, precisely estimating their body shapes and poses. Experimental results demonstrate that ClothHMR significantly outperforms existing state-of-the-art methods across benchmark datasets and in-the-wild images. Additionally, a web application for online fashion and shopping powered by ClothHMR is developed, illustrating that ClothHMR can effectively serve real-world usage scenarios. The code and model for ClothHMR are available at: \url{https://github.com/starVisionTeam/ClothHMR}.

ClothHMR: 3D Mesh Recovery of Humans in Diverse Clothing from Single Image

TL;DR

ClothHMR tackles 3D human mesh recovery under diverse clothing by introducing a two-module approach: clothing tailoring (CT), which fits garments to the body silhouette using body semantics and edge cues, and FHVM-based mesh recovering (MR), which leverages a foundational human vision model to produce high-fidelity intermediate representations (joints, depth, silhouette) and iteratively refine SMPL parameters. The method demonstrates significant improvements over state-of-the-art on Cloth4D, THuman2.0, EMDB, and 3DPW, and its efficacy is validated through ablations and a web-based virtual try-on application. The work highlights the value of integrating clothing-aware preprocessing with a strong, unified foundational model to enhance robustness to loose clothing and complex poses in 3D human reconstruction.

Abstract

With 3D data rapidly emerging as an important form of multimedia information, 3D human mesh recovery technology has also advanced accordingly. However, current methods mainly focus on handling humans wearing tight clothing and perform poorly when estimating body shapes and poses under diverse clothing, especially loose garments. To this end, we make two key insights: (1) tailoring clothing to fit the human body can mitigate the adverse impact of clothing on 3D human mesh recovery, and (2) utilizing human visual information from large foundational models can enhance the generalization ability of the estimation. Based on these insights, we propose ClothHMR, to accurately recover 3D meshes of humans in diverse clothing. ClothHMR primarily consists of two modules: clothing tailoring (CT) and FHVM-based mesh recovering (MR). The CT module employs body semantic estimation and body edge prediction to tailor the clothing, ensuring it fits the body silhouette. The MR module optimizes the initial parameters of the 3D human mesh by continuously aligning the intermediate representations of the 3D mesh with those inferred from the foundational human visual model (FHVM). ClothHMR can accurately recover 3D meshes of humans wearing diverse clothing, precisely estimating their body shapes and poses. Experimental results demonstrate that ClothHMR significantly outperforms existing state-of-the-art methods across benchmark datasets and in-the-wild images. Additionally, a web application for online fashion and shopping powered by ClothHMR is developed, illustrating that ClothHMR can effectively serve real-world usage scenarios. The code and model for ClothHMR are available at: \url{https://github.com/starVisionTeam/ClothHMR}.

Paper Structure

This paper contains 20 sections, 15 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Summary of SOTA. Current SOTA methods often misestimate body shapes or fail to handle complex poses when dealing with individuals wearing loose clothing. Even for individuals in non-loose clothing, current SOTA methods may still estimate poses without sufficient precision. ClothHMR can effectively handle challenges posed by diverse types of clothing and complex poses.
  • Figure 2: Overview. Given the image $\mathcal{I}$, ClothHMR first uses the clothing tailoring module to trim the clothing to fit the human silhouette, resulting in $\mathcal{B}$. Then, it initializes the mesh $\mathcal{M}_0$ from $\mathcal{B}$ and generates intermediate representations of $\mathcal{B}$ using the foundational human vision model (FHVM), including joints $\mathcal{G}_{j}$, depth $\mathcal{G}_{d}$, and silhouette $\mathcal{G}_{m}$. The mesh recovery module aligns the intermediate representations generated by FHVM with those produced by the mesh model ($\mathcal{P}_{j}$, $\mathcal{P}_{d}$, and $\mathcal{P}_{m}$), and through iterative optimization, ultimately generates an accurate human mesh $\mathcal{M}_r$.
  • Figure 3: Results produced by ClothHMR. For each input image ($\cdot$), we show the intermediate cloth tailoring result ($\cdot\cdot$) and the final 3D human mesh recovery result ($\cdot\cdot\cdot$). ClothHMR can accurately recover 3D human meshes while addressing the challenges posed by various loose clothing and complex poses. Please zoom in to see the details.
  • Figure 4: Visual comparison with other SOTA methods on the Cloth4D, THuman2.0, EMDB and 3DPW datasets. Please zoom in to see the details.
  • Figure 5: Visual comparison with other SOTA methods on in-the-wild images. Please zoom in to see the details.
  • ...and 10 more figures