Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

Phuong Dam; Jihoon Jeong; Anh Tran; Daeyoung Kim

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

Phuong Dam, Jihoon Jeong, Anh Tran, Daeyoung Kim

TL;DR

This paper tackles the dual challenges of efficiency and identity-preserving realism in virtual try-on. It introduces FIP-VITON, a diffusion-based framework with a warping module and a try-on module, plus a mask-aware post-processing block to maintain garment texture and user identity. By adopting a single-step diffusion with strong local/global conditioning and plug-in mask-aware post-processing, it achieves around 20× faster inference while maintaining competitive fidelity on VITON-HD and DressCode. Comprehensive ablations demonstrate the value of cross-attention, noise scheduling, and conditional post-processing, and the approach shows promising practical impact for real-time or metaverse applications, albeit with a noted complexity in post-processing and segmentation reliance.

Abstract

This study discusses the critical issues of Virtual Try-On in contemporary e-commerce and the prospective metaverse, emphasizing the challenges of preserving intricate texture details and distinctive features of the target person and the clothes in various scenarios, such as clothing texture and identity characteristics like tattoos or accessories. In addition to the fidelity of the synthesized images, the efficiency of the synthesis process presents a significant hurdle. Various existing approaches are explored, highlighting the limitations and unresolved aspects, e.g., identity information omission, uncontrollable artifacts, and low synthesis speed. It then proposes a novel diffusion-based solution that addresses garment texture preservation and user identity retention during virtual try-on. The proposed network comprises two primary modules - a warping module aligning clothing with individual features and a try-on module refining the attire and generating missing parts integrated with a mask-aware post-processing technique ensuring the integrity of the individual's identity. It demonstrates impressive results, surpassing the state-of-the-art in speed by nearly 20 times during inference, with superior fidelity in qualitative assessments. Quantitative evaluations confirm comparable performance with the recent SOTA method on the VITON-HD and Dresscode datasets. We named our model Fast and Identity Preservation Virtual TryON (FIP-VITON).

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

TL;DR

Abstract

Paper Structure (41 sections, 5 equations, 19 figures, 10 tables)

This paper contains 41 sections, 5 equations, 19 figures, 10 tables.

Introduction
Related Work
Virtual Try-on GAN-based Models
Virtual Try-on Diffusion Models
Diffusion Model Speed Up Techniques
Methodology
Preprocessing
Warping Module
Pyramid Feature Extraction.
Cascade Flow Estimation.
Objective Function.
Try-on Module
Training pipeline.
Inference pipeline.
(Un)conditional Post-processing Block.
...and 26 more sections

Figures (19)

Figure 1: Visualization for identity preservation and detail preservation compared with DCI-VTON and StableVITON DCI-VTONStableviton. Both models tend to degrade the texture of clothes, struggle with maintaining symbols on garments, and produce in noticeable artifacts while our approach maintains the fidelity of both garment textures and tattoos.
Figure 1: Visual cross-attention ablation studies in our approach. Please zoom in for better quality.
Figure 2: Overall Generation Pipeline.
Figure 2: Visualize Noise level ablation studies. Please zoom in for better quality.
Figure 3: Warping Module structure. It is crucial to highlight that our model extracts six or seven multi-scale features, depending on the input resolution (i.e., $N$ = 6 or 7). For brevity, the number of scales depicted in this figure is limited to three ($N$ = 3).
...and 14 more figures

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

TL;DR

Abstract

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (19)