GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing
Yuanhao Wang, Cheng Zhang, Gonçalo Frazão, Jinlong Yang, Alexandru-Eugen Ichim, Thabo Beeler, Fernando De la Torre
TL;DR
GarmentCrafter addresses single-view 3D garment reconstruction and editing by progressively synthesizing depth-accurate views along a closed camera loop and fusing them with a diffusion-based inpainting model conditioned on warped views. The method estimates an initial depth $D_0$ and colored point cloud $P_0$ from a single image, then iteratively updates the scene to produce $I_i$ and $D_i$ for each viewpoint $cpi_i$, merging into $P_i$. A depth completion network and a multi-view diffusion model enforce cross-view coherence, enabling high-fidelity geometry and textures and facilitating 2D-to-3D edits that propagate consistently. The approach uses Screened Poisson surface reconstruction to convert the final colored point cloud into a textured mesh and demonstrates superior performance against state-of-the-art single-view garment reconstruction baselines. This work broadens access to editable 3D garments from simple input, with potential applications in virtual try-on and AI-assisted fashion design.
Abstract
We introduce GarmentCrafter, a new approach that enables non-professional users to create and modify 3D garments from a single-view image. While recent advances in image generation have facilitated 2D garment design, creating and editing 3D garments remains challenging for non-professional users. Existing methods for single-view 3D reconstruction often rely on pre-trained generative models to synthesize novel views conditioning on the reference image and camera pose, yet they lack cross-view consistency, failing to capture the internal relationships across different views. In this paper, we tackle this challenge through progressive depth prediction and image warping to approximate novel views. Subsequently, we train a multi-view diffusion model to complete occluded and unknown clothing regions, informed by the evolving camera pose. By jointly inferring RGB and depth, GarmentCrafter enforces inter-view coherence and reconstructs precise geometries and fine details. Extensive experiments demonstrate that our method achieves superior visual fidelity and inter-view coherence compared to state-of-the-art single-view 3D garment reconstruction methods.
