Diffusion Models for Generative Outfit Recommendation

Yiyan Xu; Wenjie Wang; Fuli Feng; Yunshan Ma; Jizhi Zhang; Xiangnan He

Diffusion Models for Generative Outfit Recommendation

Yiyan Xu, Wenjie Wang, Fuli Feng, Yunshan Ma, Jizhi Zhang, Xiangnan He

TL;DR

This work defines Generative Outfit Recommendation (GOR) and proposes DiFashion, a diffusion-based model that parallel-generates multiple fashion images conditioned by category prompts, a mutual compatibility signal, and user history to form cohesive, personalized outfits. By operating in a latent space and extending classifier-free guidance to three conditioning channels, DiFashion achieves high fidelity, internal outfit compatibility, and user-aligned personalization, outperforming both generative baselines and retrieval methods on iFashion and Polyvore-U. The approach is validated through quantitative metrics, human evaluations, and ablation studies, demonstrating the value of parallel multi-image diffusion for customized fashion generation. This work advances personalized fashion generation and suggests practical pathways for deploying AIGC-based outfit synthesis in retrieval-enabled shopping and customization pipelines.

Abstract

Outfit Recommendation (OR) in the fashion domain has evolved through two stages: Pre-defined Outfit Recommendation and Personalized Outfit Composition. However, both stages are constrained by existing fashion products, limiting their effectiveness in addressing users' diverse fashion needs. Recently, the advent of AI-generated content provides the opportunity for OR to transcend these limitations, showcasing the potential for personalized outfit generation and recommendation. To this end, we introduce a novel task called Generative Outfit Recommendation (GOR), aiming to generate a set of fashion images and compose them into a visually compatible outfit tailored to specific users. The key objectives of GOR lie in the high fidelity, compatibility, and personalization of generated outfits. To achieve these, we propose a generative outfit recommender model named DiFashion, which empowers exceptional diffusion models to accomplish the parallel generation of multiple fashion images. To ensure three objectives, we design three kinds of conditions to guide the parallel generation process and adopt Classifier-Free-Guidance to enhance the alignment between the generated images and conditions. We apply DiFashion on both personalized Fill-In-The-Blank and GOR tasks and conduct extensive experiments on iFashion and Polyvore-U datasets. The quantitative and human-involved qualitative evaluation demonstrate the superiority of DiFashion over competitive baselines.

Diffusion Models for Generative Outfit Recommendation

TL;DR

Abstract

Paper Structure (25 sections, 15 equations, 8 figures, 6 tables)

This paper contains 25 sections, 15 equations, 8 figures, 6 tables.

Introduction
Preliminary
Generative Outfit Recommendation
Task Formulation
DiFashion
Diffusion Processes
Condition Encoders
Training
Inference
Experiments
Experimental Settings
Datasets.
Baselines.
Evaluation Metrics.
Implementation Details.
...and 10 more sections

Figures (8)

Figure 1: The evolution of Outfit Recommendation. Beyond existing fashion products, GOR aims to generate a set of fashion products as a compatible and personalized outfit.
Figure 2: Demonstration of DiFashion for personalized Fill-In-The-Blank and generative outfit recommendation tasks.
Figure 3: An overview of DiFashion: it gradually corrupts outfit images with Gaussian noise in the forward process, followed by a parallel conditional denoising process to reconstruct these images. The denoising process is guided by three conditions: category prompt, mutual condition, and history condition.
Figure 4: Effects of the mutual influence ratio and three guidance scales.
Figure 5: Effects of the mutual and history conditions. "w/" and "w/o" denote "with" and "without", respectively.
...and 3 more figures

Diffusion Models for Generative Outfit Recommendation

TL;DR

Abstract

Diffusion Models for Generative Outfit Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)