Table of Contents
Fetching ...

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models

Zhifeng Xie, Hao Li, Huiming Ding, Mengtian Li, Xinhan Di, Ying Cao

TL;DR

HieraFashDiff addresses the mismatch between current fashion generation models and real design workflows by modeling fashion design as a two-stage diffusion process: an ideation stage guided by high-level concepts and an iteration stage guided by low-level attributes. Built on a fine-tuned latent diffusion backbone, it introduces hierarchical prompts, pose conditioning, and per-attribute sub-stages, enabling both draft generation and sequential local editing. The authors curate the HieraFashion dataset with 5200 full-body fashion images and hierarchical captions to train and evaluate the model, and demonstrate superior fidelity, diversity, and prompt adherence compared to prior methods for both generation and editing. Ablation studies validate the importance of hierarchical prompts, attribute ordering, pose conditioning, and body-part masks, highlighting the approach's potential to augment practical fashion design pipelines.

Abstract

Fashion design is a challenging and complex process.Recent works on fashion generation and editing are all agnostic of the actual fashion design process, which limits their usage in practice.In this paper, we propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff. Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages: 1) an ideation stage that generates design proposals given high-level concepts and 2) an iteration stage that continuously refines the proposals using low-level attributes. Our model supports fashion design generation and fine-grained local editing in a single framework. To train our model, we contribute a new dataset of full-body fashion images annotated with hierarchical text descriptions. Extensive evaluations show that, as compared to prior approaches, our method can generate fashion designs and edited results with higher fidelity and better prompt adherence, showing its promising potential to augment the practical fashion design workflow. Code and Dataset are available at https://github.com/haoli-zbdbc/hierafashdiff.

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models

TL;DR

HieraFashDiff addresses the mismatch between current fashion generation models and real design workflows by modeling fashion design as a two-stage diffusion process: an ideation stage guided by high-level concepts and an iteration stage guided by low-level attributes. Built on a fine-tuned latent diffusion backbone, it introduces hierarchical prompts, pose conditioning, and per-attribute sub-stages, enabling both draft generation and sequential local editing. The authors curate the HieraFashion dataset with 5200 full-body fashion images and hierarchical captions to train and evaluate the model, and demonstrate superior fidelity, diversity, and prompt adherence compared to prior methods for both generation and editing. Ablation studies validate the importance of hierarchical prompts, attribute ordering, pose conditioning, and body-part masks, highlighting the approach's potential to augment practical fashion design pipelines.

Abstract

Fashion design is a challenging and complex process.Recent works on fashion generation and editing are all agnostic of the actual fashion design process, which limits their usage in practice.In this paper, we propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff. Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages: 1) an ideation stage that generates design proposals given high-level concepts and 2) an iteration stage that continuously refines the proposals using low-level attributes. Our model supports fashion design generation and fine-grained local editing in a single framework. To train our model, we contribute a new dataset of full-body fashion images annotated with hierarchical text descriptions. Extensive evaluations show that, as compared to prior approaches, our method can generate fashion designs and edited results with higher fidelity and better prompt adherence, showing its promising potential to augment the practical fashion design workflow. Code and Dataset are available at https://github.com/haoli-zbdbc/hierafashdiff.
Paper Structure (12 sections, 3 equations, 6 figures, 3 tables)

This paper contains 12 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The proposed HieraFashDiff is capable of generating fashion design drafts from just abstract concepts (blue text), and allowing for local editing on the generated draft iteratively through a few apparel attribute descriptions (red text). Thus, our method can be used to facilitate typical fashion design workflow by enabling efficient ideation and rapid iteration.
  • Figure 2: Overview of our method. (a) The denoising process of our model is decomposed into an ideation stage and an iteration stage, which are conditioned on high-level concepts and low-level attributes, respectively. (b) our editing method starts from the generated design daft $x^C$ and produces a sequence of edited results $(x^{A_1}, x^{A_2}, \dots)$ given text prompts for different attributes $(A_1, A_2, \dots)$. (c) our UNet-based denoising network is conditioned on additional pose information.
  • Figure 3: Qualitative results of different methods for fashion draft generation from high-level design concepts.
  • Figure 4: Qualitative comparison of iterative local editing. The latest editing methods often lack alignment with low-level attribute semantics, or cause undesirable global changes. Our method can precisely edit the corresponding regions according to the attribute descriptions while keeping the other regions unchanged, which is superior to other methods. TexFit-M refers to TexFit with our body part masks rather than its predicted ones.
  • Figure 5: Comparison of our hierarchical model (Ours) against its flat (Flat) and random attribute ordering (RandAttrOrder) variants for local editing (long dress $\rightarrow$ short dress).
  • ...and 1 more figures