Table of Contents
Fetching ...

GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing

Yuanhao Wang, Cheng Zhang, Gonçalo Frazão, Jinlong Yang, Alexandru-Eugen Ichim, Thabo Beeler, Fernando De la Torre

TL;DR

GarmentCrafter addresses single-view 3D garment reconstruction and editing by progressively synthesizing depth-accurate views along a closed camera loop and fusing them with a diffusion-based inpainting model conditioned on warped views. The method estimates an initial depth $D_0$ and colored point cloud $P_0$ from a single image, then iteratively updates the scene to produce $I_i$ and $D_i$ for each viewpoint $cpi_i$, merging into $P_i$. A depth completion network and a multi-view diffusion model enforce cross-view coherence, enabling high-fidelity geometry and textures and facilitating 2D-to-3D edits that propagate consistently. The approach uses Screened Poisson surface reconstruction to convert the final colored point cloud into a textured mesh and demonstrates superior performance against state-of-the-art single-view garment reconstruction baselines. This work broadens access to editable 3D garments from simple input, with potential applications in virtual try-on and AI-assisted fashion design.

Abstract

We introduce GarmentCrafter, a new approach that enables non-professional users to create and modify 3D garments from a single-view image. While recent advances in image generation have facilitated 2D garment design, creating and editing 3D garments remains challenging for non-professional users. Existing methods for single-view 3D reconstruction often rely on pre-trained generative models to synthesize novel views conditioning on the reference image and camera pose, yet they lack cross-view consistency, failing to capture the internal relationships across different views. In this paper, we tackle this challenge through progressive depth prediction and image warping to approximate novel views. Subsequently, we train a multi-view diffusion model to complete occluded and unknown clothing regions, informed by the evolving camera pose. By jointly inferring RGB and depth, GarmentCrafter enforces inter-view coherence and reconstructs precise geometries and fine details. Extensive experiments demonstrate that our method achieves superior visual fidelity and inter-view coherence compared to state-of-the-art single-view 3D garment reconstruction methods.

GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing

TL;DR

GarmentCrafter addresses single-view 3D garment reconstruction and editing by progressively synthesizing depth-accurate views along a closed camera loop and fusing them with a diffusion-based inpainting model conditioned on warped views. The method estimates an initial depth and colored point cloud from a single image, then iteratively updates the scene to produce and for each viewpoint , merging into . A depth completion network and a multi-view diffusion model enforce cross-view coherence, enabling high-fidelity geometry and textures and facilitating 2D-to-3D edits that propagate consistently. The approach uses Screened Poisson surface reconstruction to convert the final colored point cloud into a textured mesh and demonstrates superior performance against state-of-the-art single-view garment reconstruction baselines. This work broadens access to editable 3D garments from simple input, with potential applications in virtual try-on and AI-assisted fashion design.

Abstract

We introduce GarmentCrafter, a new approach that enables non-professional users to create and modify 3D garments from a single-view image. While recent advances in image generation have facilitated 2D garment design, creating and editing 3D garments remains challenging for non-professional users. Existing methods for single-view 3D reconstruction often rely on pre-trained generative models to synthesize novel views conditioning on the reference image and camera pose, yet they lack cross-view consistency, failing to capture the internal relationships across different views. In this paper, we tackle this challenge through progressive depth prediction and image warping to approximate novel views. Subsequently, we train a multi-view diffusion model to complete occluded and unknown clothing regions, informed by the evolving camera pose. By jointly inferring RGB and depth, GarmentCrafter enforces inter-view coherence and reconstructs precise geometries and fine details. Extensive experiments demonstrate that our method achieves superior visual fidelity and inter-view coherence compared to state-of-the-art single-view 3D garment reconstruction methods.

Paper Structure

This paper contains 33 sections, 5 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: From a real-world clothing image, GarmentCrafter synthesizes high-quality novel views, enabling the reconstruction of garment meshes with accurate geometry and rich detail. Additionally, users can easily apply 2D edits (e.g., modifying parts or surface details) using off-the-shelf tools on a single image, and GarmentCrafter seamlessly applies these edits across the 3D model with multi-view consistency.
  • Figure 2: An illustration of progressive novel view synthesis in GarmentCrafter.Left: Given a garment image, our method performs depth-aware novel view synthesis along a predefined zigzag camera trajectory. Right: For each camera rotation from $\pi_{i-1}$ to $\pi_{i}$, we project the current point cloud $P_{i-1}$ into the image space based on camera pose $\pi_{i}$, resulting in incomplete RGB and depth images. Our diffusion model completes the RGB image using the warped view, input image, and camera pose as conditions, while a depth completion network refines the depth map based on the completed RGB, warped depth, and camera pose. The re-projected point cloud $P'_{i}$ is then merged with $P_{i-1}$ to produce an updated point cloud $P_{i}$. This iterative process continues until a full 3D representation of the garment is achieved.
  • Figure 3: Qualitative comparison on single-view 3D garment reconstruction with state-of-the-art methods. Our method demonstrates better performance in handling complex texture patterns and geometric structures compared to InstantMesh xu2024instantmesh, Hunyuan3D-1.0 yang2024tencent, and Convolutional Reconstruction Model (CRM) wang2025crm.
  • Figure 4: More qualitative results of GarmentCrafter on single-view reconstruction. Please see supplementary for more results.
  • Figure 5: Analysis of projected image conditioning. Left: we show original input and projected RGB images. Middle: completed RGB images with and without Progressive Novel View Synthesis (P-NVS). Right: difference between completed and projected images, showing our novel view aligns more closely with the ground-truth projected RGB. Zoom-in for details.
  • ...and 9 more figures