IGR: Improving Diffusion Model for Garment Restoration from Person Image
Le Shen, Rong Huang, Zhijie Wang
TL;DR
IGR addresses the garment restoration task by conditioning a latent diffusion model on dual garment features: low-level $F_{ll}$ from IP-Adapter and high-level semantics $F_{hl}$ from GarmNet, fused via Garment Fusion blocks into the GarmDenoiser. It uses a coarse-to-fine training strategy that starts from VITON-HD data and then fine-tunes on a GarmRe-specific dataset to improve fidelity under occlusions. Quantitatively, IGR outperforms baselines on VITON-HD and StreetTryOn in metrics such as SSIM, LPIPS, DISTS, FID, CLIP-FID, and KID, while ablations confirm the utility of GarmNet, HQFT, and tuned guidance scale. This approach offers robust garment restoration suitable for downstream virtual try-on and fashion applications, leveraging strong diffusion priors to generate authentic garments aligned with reference persons.
Abstract
Garment restoration, the inverse of virtual try-on task, focuses on restoring standard garment from a person image, requiring accurate capture of garment details. However, existing methods often fail to preserve the identity of the garment or rely on complex processes. To address these limitations, we propose an improved diffusion model for restoring authentic garments. Our approach employs two garment extractors to independently capture low-level features and high-level semantics from the person image. Leveraging a pretrained latent diffusion model, these features are integrated into the denoising process through garment fusion blocks, which combine self-attention and cross-attention layers to align the restored garment with the person image. Furthermore, a coarse-to-fine training strategy is introduced to enhance the fidelity and authenticity of the generated garments. Experimental results demonstrate that our model effectively preserves garment identity and generates high-quality restorations, even in challenging scenarios such as complex garments or those with occlusions.
