Table of Contents
Fetching ...

Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates

Ren Li, Cong Cao, Corentin Dumery, Yingxuan You, Hao Li, Pascal Fua

TL;DR

This work tackles single-view reconstruction of high-fidelity 3D garments, with a focus on loose-fitting clothing. It introduces DISP, a garment representation that extends Implicit Sewing Patterns (ISP) with a diffusion prior, enabling realistic 3D deformations in a unified UV space and a diffusion-based mapping from image pixels to 3D and UV coordinates. A multi-stage pipeline combines image observations, a diffusion-driven back-normal model, and DISP priors to recover both rest-state and deformed garment geometries, followed by refinement and body adjustments. Empirical results on synthetic CLOTH3D data show improved geometric accuracy and detail over state-of-the-art methods, and the approach supports downstream tasks such as garment retargeting and texture editing, with demonstrated generalization to in-the-wild imagery despite training on synthetic data.

Abstract

Reconstructing 3D clothed humans from images is fundamental to applications like virtual try-on, avatar creation, and mixed reality. While recent advances have enhanced human body recovery, accurate reconstruction of garment geometry -- especially for loose-fitting clothing -- remains an open challenge. We present a novel method for high-fidelity 3D garment reconstruction from single images that bridges 2D and 3D representations. Our approach combines Implicit Sewing Patterns (ISP) with a generative diffusion model to learn rich garment shape priors in a 2D UV space. A key innovation is our mapping model that establishes correspondences between 2D image pixels, UV pattern coordinates, and 3D geometry, enabling joint optimization of both 3D garment meshes and the corresponding 2D patterns by aligning learned priors with image observations. Despite training exclusively on synthetically simulated cloth data, our method generalizes effectively to real-world images, outperforming existing approaches on both tight- and loose-fitting garments. The reconstructed garments maintain physical plausibility while capturing fine geometric details, enabling downstream applications including garment retargeting and texture manipulation.

Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates

TL;DR

This work tackles single-view reconstruction of high-fidelity 3D garments, with a focus on loose-fitting clothing. It introduces DISP, a garment representation that extends Implicit Sewing Patterns (ISP) with a diffusion prior, enabling realistic 3D deformations in a unified UV space and a diffusion-based mapping from image pixels to 3D and UV coordinates. A multi-stage pipeline combines image observations, a diffusion-driven back-normal model, and DISP priors to recover both rest-state and deformed garment geometries, followed by refinement and body adjustments. Empirical results on synthetic CLOTH3D data show improved geometric accuracy and detail over state-of-the-art methods, and the approach supports downstream tasks such as garment retargeting and texture editing, with demonstrated generalization to in-the-wild imagery despite training on synthetic data.

Abstract

Reconstructing 3D clothed humans from images is fundamental to applications like virtual try-on, avatar creation, and mixed reality. While recent advances have enhanced human body recovery, accurate reconstruction of garment geometry -- especially for loose-fitting clothing -- remains an open challenge. We present a novel method for high-fidelity 3D garment reconstruction from single images that bridges 2D and 3D representations. Our approach combines Implicit Sewing Patterns (ISP) with a generative diffusion model to learn rich garment shape priors in a 2D UV space. A key innovation is our mapping model that establishes correspondences between 2D image pixels, UV pattern coordinates, and 3D geometry, enabling joint optimization of both 3D garment meshes and the corresponding 2D patterns by aligning learned priors with image observations. Despite training exclusively on synthetically simulated cloth data, our method generalizes effectively to real-world images, outperforming existing approaches on both tight- and loose-fitting garments. The reconstructed garments maintain physical plausibility while capturing fine geometric details, enabling downstream applications including garment retargeting and texture manipulation.

Paper Structure

This paper contains 30 sections, 19 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Pipeline. Given an image of a clothed person, we first estimate the front normal $\mathbf{N}_F$ of the target garment, and the SMPL body model which is used to render the body part segmentation ($\mathbf{S}_F$, $\mathbf{S}_B$) and depth ($\mathbf{D}_F^b$, $\mathbf{D}_B^b$) images. The back normal $\mathbf{N}_B$ of the garment is estimated subsequently by the diffusion model $\boldsymbol{\epsilon}_{\theta}^n$. We then predict the UV-coordinate ($\mathbf{C}_F$, $\mathbf{C}_B$) and the depth ($\mathbf{D}_F^g$, $\mathbf{D}_B^g$) images from the garment normal and body estimations with the mapping model $\boldsymbol{\epsilon}_{\theta}^m$. The incomplete UV positional map $\Tilde{\mathcal{U}}$ is produced from them using the camera backprojection. Finally, we fit $\Tilde{\mathcal{U}}$ to DISP to recover the complete UV positional map $\hat{\mathcal{U}}$ and the corresponding garment mesh $\mathbf{G}$, which is further improved by the refinement.
  • Figure 2: Mapping between pixel, 3D, and UV spaces. The pixel $(x,y)$ is mapped to $(X,Y,Z)$ in the 3D space using the estimated depth $d$ and the camera backprojection $P^{-1}$, and to $(u,v)$ in the UV space using the estimated UV coordinates $(u,v,\sigma)$. The dash line indicates that $(X,Y,Z)$ and $(u,v)$ are connected indirectly through $(x,y)$.
  • Figure 3: Recovering garment rest geometry. Given (a) the incomplete panel mask $\Tilde{\mathcal{M}}$, we fit (b) the complete panel mask $\mathcal{M}$ by Eq. \ref{['eq:z']}. (c) shows the overlay of $\Tilde{\mathcal{M}}$ in gray and $\mathcal{M}$ in white. (d) is the corresponding rest-state garment mesh $\bar{\mathbf{G}}$ for (b).
  • Figure 4: Body refinement. For the image of (a), we refine the initial body estimation of (c) by Eq. \ref{['eq:bodyrefine']} to improve its accuracy and align it with the image as (b).
  • Figure 5: Qualitative comparison with state-of-the-art methods. The top and bottom rows show the front and the back of the reconstructions produced by our method, BCNet Jiang20d, SMPLicit Corona21, ISP Li23a, GaRec Li24a and ECON Xiu23, respectively.
  • ...and 5 more figures