Table of Contents
Fetching ...

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Abhishek Kumar Singh, Ioannis Patras

TL;DR

This work addresses fashion garment generation conditioned on multimodal inputs (text and sketches) by introducing FashionSD-X, a latent diffusion-based pipeline that leverages LoRA fine-tuning and ControlNet conditioning. Two integrated pipelines are proposed: a text-conditioned LoRA-finetuned Stable Diffusion model and a ControlNet-enhanced variant trained on sketch data, both applied to extended virtual try-on datasets (Dress Code and VITON-HD) augmented with garment sketches. The study introduces a sketch-augmented dataset, a novel sketch-similarity evaluation, and a comprehensive evaluation using FID, FID-CLIP, KID, CLIP Score, and SSIM, demonstrating improved realism and sketch adherence compared with vanilla Stable Diffusion. The findings highlight the potential of diffusion-based approaches to transform fashion design workflows, enabling interactive, personalized garment synthesis and fueling broader adoption in design pipelines.

Abstract

The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal inputs such as text and sketches. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data. Our evaluation, utilizing metrics like FID, CLIP Score, and KID, demonstrates that our model significantly outperforms traditional stable diffusion models. The results not only highlight the effectiveness of our model in generating fashion-appropriate outputs but also underscore the potential of diffusion models in revolutionizing fashion design workflows. This research paves the way for more interactive, personalized, and technologically enriched methodologies in fashion design and representation, bridging the gap between creative vision and practical application.

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

TL;DR

This work addresses fashion garment generation conditioned on multimodal inputs (text and sketches) by introducing FashionSD-X, a latent diffusion-based pipeline that leverages LoRA fine-tuning and ControlNet conditioning. Two integrated pipelines are proposed: a text-conditioned LoRA-finetuned Stable Diffusion model and a ControlNet-enhanced variant trained on sketch data, both applied to extended virtual try-on datasets (Dress Code and VITON-HD) augmented with garment sketches. The study introduces a sketch-augmented dataset, a novel sketch-similarity evaluation, and a comprehensive evaluation using FID, FID-CLIP, KID, CLIP Score, and SSIM, demonstrating improved realism and sketch adherence compared with vanilla Stable Diffusion. The findings highlight the potential of diffusion-based approaches to transform fashion design workflows, enabling interactive, personalized garment synthesis and fueling broader adoption in design pipelines.

Abstract

The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal inputs such as text and sketches. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data. Our evaluation, utilizing metrics like FID, CLIP Score, and KID, demonstrates that our model significantly outperforms traditional stable diffusion models. The results not only highlight the effectiveness of our model in generating fashion-appropriate outputs but also underscore the potential of diffusion models in revolutionizing fashion design workflows. This research paves the way for more interactive, personalized, and technologically enriched methodologies in fashion design and representation, bridging the gap between creative vision and practical application.
Paper Structure (14 sections, 1 equation, 8 figures, 4 tables)

This paper contains 14 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Results of our proposed pipeline FashionSD-X, where the results are guided by a given input text prompt and sketch.
  • Figure 2: Overview of the Architecture of the proposed FashionSDX , a fashion-centric fine-tuned stable diffusion model
  • Figure 3: Overview of our Pipeline for Integrating sketches to guide the diffusion process of FashionSDX model
  • Figure 4: Basic Architecture of ControlNet Network
  • Figure 5: Comparison of Stable Diffusion v/s our trained model for the same prompt.
  • ...and 3 more figures