Table of Contents
Fetching ...

DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image

Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee

TL;DR

DeClotH tackles decomposable 3D cloth and human body reconstruction from a single image, addressing severe cloth–body occlusion. It introduces a template-regularized optimization over two DMTets to jointly recover cloth and body geometry and texture, leveraging ClothNet/BodyNet templates as priors. A cloth-specific diffusion model, ClothDiffusion, guides cloth appearance and is integrated with cloth and human SDS losses, while a HumanDiffusion-based loss handles occluded body regions. Experiments on 4D-DRESS and THuman2.0 demonstrate superior geometry (CD/NC) and texture (PSNR/LPIPS), and improved 3D cloth decomposition (POR score) over state-of-the-art methods, enabling practical applications like virtual try-on and pose editing. Limitations include cloth diversity and inter-penetration, with future work targeting richer cloth templates and better cloth–body interaction modeling.

Abstract

Most existing methods of 3D clothed human reconstruction from a single image treat the clothed human as a single object without distinguishing between cloth and human body. In this regard, we present DeClotH, which separately reconstructs 3D cloth and human body from a single image. This task remains largely unexplored due to the extreme occlusion between cloth and the human body, making it challenging to infer accurate geometries and textures. Moreover, while recent 3D human reconstruction methods have achieved impressive results using text-to-image diffusion models, directly applying such an approach to this problem often leads to incorrect guidance, particularly in reconstructing 3D cloth. To address these challenges, we propose two core designs in our framework. First, to alleviate the occlusion issue, we leverage 3D template models of cloth and human body as regularizations, which provide strong geometric priors to prevent erroneous reconstruction by the occlusion. Second, we introduce a cloth diffusion model specifically designed to provide contextual information about cloth appearance, thereby enhancing the reconstruction of 3D cloth. Qualitative and quantitative experiments demonstrate that our proposed approach is highly effective in reconstructing both 3D cloth and the human body. More qualitative results are provided at https://hygenie1228.github.io/DeClotH/.

DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image

TL;DR

DeClotH tackles decomposable 3D cloth and human body reconstruction from a single image, addressing severe cloth–body occlusion. It introduces a template-regularized optimization over two DMTets to jointly recover cloth and body geometry and texture, leveraging ClothNet/BodyNet templates as priors. A cloth-specific diffusion model, ClothDiffusion, guides cloth appearance and is integrated with cloth and human SDS losses, while a HumanDiffusion-based loss handles occluded body regions. Experiments on 4D-DRESS and THuman2.0 demonstrate superior geometry (CD/NC) and texture (PSNR/LPIPS), and improved 3D cloth decomposition (POR score) over state-of-the-art methods, enabling practical applications like virtual try-on and pose editing. Limitations include cloth diversity and inter-penetration, with future work targeting richer cloth templates and better cloth–body interaction modeling.

Abstract

Most existing methods of 3D clothed human reconstruction from a single image treat the clothed human as a single object without distinguishing between cloth and human body. In this regard, we present DeClotH, which separately reconstructs 3D cloth and human body from a single image. This task remains largely unexplored due to the extreme occlusion between cloth and the human body, making it challenging to infer accurate geometries and textures. Moreover, while recent 3D human reconstruction methods have achieved impressive results using text-to-image diffusion models, directly applying such an approach to this problem often leads to incorrect guidance, particularly in reconstructing 3D cloth. To address these challenges, we propose two core designs in our framework. First, to alleviate the occlusion issue, we leverage 3D template models of cloth and human body as regularizations, which provide strong geometric priors to prevent erroneous reconstruction by the occlusion. Second, we introduce a cloth diffusion model specifically designed to provide contextual information about cloth appearance, thereby enhancing the reconstruction of 3D cloth. Qualitative and quantitative experiments demonstrate that our proposed approach is highly effective in reconstructing both 3D cloth and the human body. More qualitative results are provided at https://hygenie1228.github.io/DeClotH/.

Paper Structure

This paper contains 25 sections, 8 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Overview of DeClotH. Given a single image, our framework reconstructs 3D cloth and human body based on the 3D cloth and body templates.
  • Figure 2: Comparison between an existing diffusion model and ClothDiffusion. Unlike the representative diffusion model, StableDiffusion rombach2022high, our ClothDiffusion generates cloth-specific images and can be controlled by cloth and human body templates.
  • Figure 3: Overall pipeline of DeClotH. Given an input image $\textbf{I}$, DeClotH optimizes 3D cloth and human body, represented by DMTets (\ref{['sec:dmtet']}). For the optimization, we extract normal map $\textbf{N}$, silhouette $\textbf{S}$, and 3D template meshes ($\textbf{M}^{\text{t}}_{\text{cloth}}$ and $\textbf{M}^{\text{t}}_{\text{body}}$) (\ref{['sec:preprocessing']}). Subsequently, the 3D cloth and human body are optimized by three core loss functions: template regularization loss (\ref{['sec:regloss']}), cloth SDS loss (\ref{['sec:cloth_sds']}), and human SDS loss (\ref{['sec:human_sds']}).
  • Figure 4: Training process of ClothDiffusion. We train the ClothDiffusion based on our collected cloth-specific training data. The ClothDiffusion follows ControlNet architecture with the pre-trained StableDiffusion.
  • Figure 5: Effects of the optimization process of DeClotH.
  • ...and 10 more figures