Table of Contents
Fetching ...

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao

TL;DR

GaussianVTON is proposed, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON, and a new editing strategy termed Edit Recall Reconstruction (ERR) is introduced to tackle the limitations of previous editing strategies in leading to complex geometric changes.

Abstract

The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

TL;DR

GaussianVTON is proposed, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON, and a new editing strategy termed Edit Recall Reconstruction (ERR) is introduced to tackle the limitations of previous editing strategies in leading to complex geometric changes.

Abstract

The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.
Paper Structure (12 sections, 8 equations, 6 figures, 2 tables)

This paper contains 12 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: GaussianVTON enables efficient human-environment interaction in try-on applications by reconstructing and editing multi-view images. Our method for the first time employs image prompting to achieve more precise and customized 3D Gaussian Splatting editing. Unlike existing works that rely on text prompts for 3D editing, e.g., GaussianEditor chen2023gaussianeditor, our method avoids erroneously replacing clothing and affecting other areas of the garment, as well as causing changes in other elements like background and facial features. Furthermore, compared to text-driven 3D clothed human generation or reconstruction works, e.g., HumanGaussian liu2023humangaussian, our method is based on real human images, avoids resulting in odd body shapes, and aligns with the prompts.
  • Figure 2: Overall framework of the proposed GaussianVTON.
  • Figure 3: Three-Stage Refinement. The three-stage refinement strategy we devised demonstrates sequential mitigation of prominent issues encountered when utilizing 2D VTON models (i.e. LaDI-VTON) for image editing, including facial blurring, garment inaccuracies, and degradation in image quality.
  • Figure 4: Qualitative Comparison. We ask GPT-4 openai2023gpt4 to generate detailed descriptions of target garments, followed by format "Turn his upper body into ..." as the text prompt for InstructN2N haque2023instructnerf2nerf and GaussianEditor chen2023gaussianeditor. We adopt GSEditor-iN2N from GaussianEditor as the comparative model due to its superior performance.
  • Figure 5: Extensive Results of GaussianVTON. To further validate the efficacy of our framework, we also employ multi-view image data of a female, which further substantiates the superiority and capability to adopt custom data of GaussianVTON.
  • ...and 1 more figures