Table of Contents
Fetching ...

StableGarment: Garment-Centric Generation via Stable Diffusion

Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li

TL;DR

The paper tackles garment-centric image generation by marrying detailed garment texture preservation with the flexibility of diffusion-based text-to-image generation. It introduces StableGarment, a diffusion-based framework featuring a garment encoder connected via Additive Self-Attention to a fixed Stable Diffusion UNet, plus a dedicated try-on ControlNet and a data engine for synthetic data generation. This setup enables GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on, with separate prompts for garment and target image. Through extensive qualitative, quantitative, and user studies, the authors report SOTA performance in virtual try-on and strong texture fidelity, supported by ablations that validate the ASA mechanism and the data-engine design. The work offers a practical, extensible path for garment-centric generation with broad applications in fashion, design, and e-commerce.

Abstract

In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in retaining the intricate textures of the garment while maintaining the flexibility of pre-trained Stable Diffusion. Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These ASA layers are specifically devised to transfer detailed garment textures, also facilitating the integration of stylized base models for the creation of stylized images. Furthermore, the incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision. We also build a novel data engine that produces high-quality synthesized data to preserve the model's ability to follow prompts. Extensive experiments demonstrate that our approach delivers state-of-the-art (SOTA) results among existing virtual try-on methods and exhibits high flexibility with broad potential applications in various garment-centric image generation.

StableGarment: Garment-Centric Generation via Stable Diffusion

TL;DR

The paper tackles garment-centric image generation by marrying detailed garment texture preservation with the flexibility of diffusion-based text-to-image generation. It introduces StableGarment, a diffusion-based framework featuring a garment encoder connected via Additive Self-Attention to a fixed Stable Diffusion UNet, plus a dedicated try-on ControlNet and a data engine for synthetic data generation. This setup enables GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on, with separate prompts for garment and target image. Through extensive qualitative, quantitative, and user studies, the authors report SOTA performance in virtual try-on and strong texture fidelity, supported by ablations that validate the ASA mechanism and the data-engine design. The work offers a practical, extensible path for garment-centric generation with broad applications in fashion, design, and e-commerce.

Abstract

In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in retaining the intricate textures of the garment while maintaining the flexibility of pre-trained Stable Diffusion. Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These ASA layers are specifically devised to transfer detailed garment textures, also facilitating the integration of stylized base models for the creation of stylized images. Furthermore, the incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision. We also build a novel data engine that produces high-quality synthesized data to preserve the model's ability to follow prompts. Extensive experiments demonstrate that our approach delivers state-of-the-art (SOTA) results among existing virtual try-on methods and exhibits high flexibility with broad potential applications in various garment-centric image generation.
Paper Structure (19 sections, 7 equations, 17 figures, 7 tables)

This paper contains 19 sections, 7 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: The proposed StableGarment can perform various garment-centric generation tasks. Given a garment input, it could 1) utilize text prompts or control signals to generate a realistic model wearing the garment, 2) support switching stylized models to generate stylized models wearing the garment, and 3) conventional virtual try-on tasks. While performing those tasks, the details of the garment could be well preserved and the garments warping are visually natural.
  • Figure 2: Overview of our framework, consisting of a data engine, garment encoder, and try-on ControlNet. The data engine preserves the model's capacity to follow prompts, while the garment encoder with addictive self-attention layer captures garment details. Meanwhile, the try-on ControlNet is designed for virtual try-on tasks.
  • Figure 3: Comparison with subject-driven generation methods.
  • Figure 4: Qualitative comparison with baselines on VITON-HD dataset.
  • Figure 5: Comparison on Dress Code dataset.
  • ...and 12 more figures