Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

Jia Yu; Lichao Zhang; Zijie Chen; Fayu Pan; MiaoMiao Wen; Yuming Yan; Fangsheng Weng; Shuai Zhang; Lili Pan; Zhenzhong Lan

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

Jia Yu, Lichao Zhang, Zijie Chen, Fayu Pan, MiaoMiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan

TL;DR

This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field and proposing a new benchmark comprising multiple datasets for evaluating the performance of fashion design models.

Abstract

The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

TL;DR

Abstract

Paper Structure (22 sections, 21 figures, 8 tables)

This paper contains 22 sections, 21 figures, 8 tables.

Introduction
Related Work
Fashion Image Datasets
Garment synthesis
Fashion-Diffusion Dataset
Data Collection & Processing
Data Annotation
Statistical Analysis
Descriptive Attribute Distribution
Text-Image Relevance
Experiments
Fashion-Diffusion Benchmark
Generation Results on Fashion-Diffusion
Comparison of Generation Results on Different Datasets
Conclusion
...and 7 more sections

Figures (21)

Figure 1: Overview of Fashion-Diffusion. Our Fashion-Diffusion Dataset contains 1,044,491 high-resolution, high-quality fashion images with 1,593,808 high-quality text descriptions, which include descriptions about both garments and humans.
Figure 2: The workflow of the annotation procedure for Fashion-Diffusion. To complete the full annotation task, we employ three stages, namely 'Garment and Human Detection', 'Attributes Labelling', and 'Text Generation', to ensure the annotation in high-quality level as well as the accuracy and professionalism of the text-image information.
Figure 3: Descriptive attribute distribution with respect to classes of 'Fabric', 'Category', 'Color', 'Style', 'Collar' and 'Technology'. We display exemplar real images for specific attributes under each class and also provide statistics for their top-10 attributes on the bottom row.
Figure 4: Left: Age and gender distribution in four people factions, i.e., 'Women's Clothing', 'Men's Clothing', 'Girls' Clothing', and 'Boys' Clothing', in Fashion-Diffusion dataset. Right: We collect fashion images from a variety of races with different skin colors, making our data more representative in terms of global diversity.
Figure 5: Left: Attributes distribution of the specific 'garment category' class describing the type of the clothing in the fashion image. Right: Length distribution of prompts describing both the person and the garment in Fashion-Diffusion dataset.
...and 16 more figures

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

TL;DR

Abstract

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

Authors

TL;DR

Abstract

Table of Contents

Figures (21)