Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models
Phuong Dam, Jihoon Jeong, Anh Tran, Daeyoung Kim
TL;DR
This paper tackles the dual challenges of efficiency and identity-preserving realism in virtual try-on. It introduces FIP-VITON, a diffusion-based framework with a warping module and a try-on module, plus a mask-aware post-processing block to maintain garment texture and user identity. By adopting a single-step diffusion with strong local/global conditioning and plug-in mask-aware post-processing, it achieves around 20× faster inference while maintaining competitive fidelity on VITON-HD and DressCode. Comprehensive ablations demonstrate the value of cross-attention, noise scheduling, and conditional post-processing, and the approach shows promising practical impact for real-time or metaverse applications, albeit with a noted complexity in post-processing and segmentation reliance.
Abstract
This study discusses the critical issues of Virtual Try-On in contemporary e-commerce and the prospective metaverse, emphasizing the challenges of preserving intricate texture details and distinctive features of the target person and the clothes in various scenarios, such as clothing texture and identity characteristics like tattoos or accessories. In addition to the fidelity of the synthesized images, the efficiency of the synthesis process presents a significant hurdle. Various existing approaches are explored, highlighting the limitations and unresolved aspects, e.g., identity information omission, uncontrollable artifacts, and low synthesis speed. It then proposes a novel diffusion-based solution that addresses garment texture preservation and user identity retention during virtual try-on. The proposed network comprises two primary modules - a warping module aligning clothing with individual features and a try-on module refining the attire and generating missing parts integrated with a mask-aware post-processing technique ensuring the integrity of the individual's identity. It demonstrates impressive results, surpassing the state-of-the-art in speed by nearly 20 times during inference, with superior fidelity in qualitative assessments. Quantitative evaluations confirm comparable performance with the recent SOTA method on the VITON-HD and Dresscode datasets. We named our model Fast and Identity Preservation Virtual TryON (FIP-VITON).
