CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, Anan Liu
TL;DR
This work addresses the dual challenges of controllability and speed in diffusion-model-based virtual try-on. It introduces CAT-DM, which combines a Garment-Conditioned Diffusion Model (GC-DM) with a truncation-based acceleration strategy that seeds diffusion from a pre-trained GAN, enabling rapid, high-fidelity garment synthesis. GC-DM leverages ControlNet and enhanced garment feature extraction (via DINO-V2) to preserve garment patterns while sorting blending with the original image through Poisson blending. On DressCode and VITON-HD, CAT-DM achieves state-of-the-art realism and garment detail with as few as two diffusion steps, offering substantial speedups over traditional diffusion methods and competitive performance relative to GAN-based baselines.
Abstract
Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on, but have not resolved problems such as unnatural deformation of garments and the blurry generation quality. While the generative quality of diffusion models is impressive, achieving controllability poses a significant challenge when applying it to virtual try-on and multiple denoising iterations limit its potential for real-time applications. In this paper, we propose Controllable Accelerated virtual Try-on with Diffusion Model (CAT-DM). To enhance the controllability, a basic diffusion-based virtual try-on network is designed, which utilizes ControlNet to introduce additional control conditions and improves the feature extraction of garment images. In terms of acceleration, CAT-DM initiates a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model. Compared with previous try-on methods based on diffusion models, CAT-DM not only retains the pattern and texture details of the inshop garment but also reduces the sampling steps without compromising generation quality. Extensive experiments demonstrate the superiority of CAT-DM against both GANbased and diffusion-based methods in producing more realistic images and accurately reproducing garment patterns.
