TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On
Jiazheng Xing, Chao Xu, Yijie Qian, Yang Liu, Guang Dai, Baigui Sun, Yong Liu, Jingdong Wang
TL;DR
The paper tackles two core issues in diffusion-based virtual try-on: preserving fine-grained garment identity and improving training efficiency. It introduces TryOn-Adapter, which factorizes clothing identity into style, texture, and structure, and frictionlessly injects corresponding cues into a frozen diffusion backbone via three lightweight adapters and a training-free T-RePaint strategy. An enhanced latent blending module further stabilizes image synthesis, enabling high-fidelity results with significantly fewer trainable parameters than full fine-tuning. Empirical results on VITON-HD and Dresscode show state-of-the-art identity preservation and realism, validating the approach and its practical potential for efficient, controllable virtual try-on systems. The work also provides detailed ablations and analyses, underscoring the contributions of each component and offering a clear path toward scalable deployment.
Abstract
Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the widespread applications. In this work, we propose an effective and efficient framework, termed TryOn-Adapter. Specifically, we first decouple clothing identity into fine-grained factors: style for color and category information, texture for high-frequency details, and structure for smooth spatial adaptive transformation. Our approach utilizes a pre-trained exemplar-based diffusion model as the fundamental network, whose parameters are frozen except for the attention layers. We then customize three lightweight modules (Style Preserving, Texture Highlighting, and Structure Adapting) incorporated with fine-tuning techniques to enable precise and efficient identity control. Meanwhile, we introduce the training-free T-RePaint strategy to further enhance clothing identity preservation while maintaining the realistic try-on effect during the inference. Our experiments demonstrate that our approach achieves state-of-the-art performance on two widely-used benchmarks. Additionally, compared with recent full-tuning diffusion-based methods, we only use about half of their tunable parameters during training. The code will be made publicly available at https://github.com/jiazheng-xing/TryOn-Adapter.
