Table of Contents
Fetching ...

FashionDPO:Fine-tune Fashion Outfit Generation Model using Direct Preference Optimization

Mingzhe Yu, Yunshan Ma, Lei Wu, Changshuo Wang, Xue Li, Lei Meng

TL;DR

FashionDPO tackles the lack of diversity in personalized outfit generation by fine-tuning a pre-trained diffusion-based generator through direct preference optimization. It introduces a multi-expert feedback loop (quality, compatibility, personalization) to produce positive-negative pairs that guide learning without a handcrafted reward function, implemented via LoRA-based fine-tuning over saved diffusion timesteps. Across PFITB and GOR tasks on iFashion and Polyvore-U, FashionDPO achieves higher diversity (IS, IS-acc) and better alignment with user preferences and fashion compatibility than strong baselines like DiFashion. The framework demonstrates strong generalization, practicality, and scalability, offering a cost-efficient path to integrate expert-like feedback into generative fashion systems, with avenues for richer feedback in future work.

Abstract

Personalized outfit generation aims to construct a set of compatible and personalized fashion items as an outfit. Recently, generative AI models have received widespread attention, as they can generate fashion items for users to complete an incomplete outfit or create a complete outfit. However, they have limitations in terms of lacking diversity and relying on the supervised learning paradigm. Recognizing this gap, we propose a novel framework FashionDPO, which fine-tunes the fashion outfit generation model using direct preference optimization. This framework aims to provide a general fine-tuning approach to fashion generative models, refining a pre-trained fashion outfit generation model using automatically generated feedback, without the need to design a task-specific reward function. To make sure that the feedback is comprehensive and objective, we design a multi-expert feedback generation module which covers three evaluation perspectives, \ie quality, compatibility and personalization. Experiments on two established datasets, \ie iFashion and Polyvore-U, demonstrate the effectiveness of our framework in enhancing the model's ability to align with users' personalized preferences while adhering to fashion compatibility principles. Our code and model checkpoints are available at https://github.com/Yzcreator/FashionDPO.

FashionDPO:Fine-tune Fashion Outfit Generation Model using Direct Preference Optimization

TL;DR

FashionDPO tackles the lack of diversity in personalized outfit generation by fine-tuning a pre-trained diffusion-based generator through direct preference optimization. It introduces a multi-expert feedback loop (quality, compatibility, personalization) to produce positive-negative pairs that guide learning without a handcrafted reward function, implemented via LoRA-based fine-tuning over saved diffusion timesteps. Across PFITB and GOR tasks on iFashion and Polyvore-U, FashionDPO achieves higher diversity (IS, IS-acc) and better alignment with user preferences and fashion compatibility than strong baselines like DiFashion. The framework demonstrates strong generalization, practicality, and scalability, offering a cost-efficient path to integrate expert-like feedback into generative fashion systems, with avenues for richer feedback in future work.

Abstract

Personalized outfit generation aims to construct a set of compatible and personalized fashion items as an outfit. Recently, generative AI models have received widespread attention, as they can generate fashion items for users to complete an incomplete outfit or create a complete outfit. However, they have limitations in terms of lacking diversity and relying on the supervised learning paradigm. Recognizing this gap, we propose a novel framework FashionDPO, which fine-tunes the fashion outfit generation model using direct preference optimization. This framework aims to provide a general fine-tuning approach to fashion generative models, refining a pre-trained fashion outfit generation model using automatically generated feedback, without the need to design a task-specific reward function. To make sure that the feedback is comprehensive and objective, we design a multi-expert feedback generation module which covers three evaluation perspectives, \ie quality, compatibility and personalization. Experiments on two established datasets, \ie iFashion and Polyvore-U, demonstrate the effectiveness of our framework in enhancing the model's ability to align with users' personalized preferences while adhering to fashion compatibility principles. Our code and model checkpoints are available at https://github.com/Yzcreator/FashionDPO.

Paper Structure

This paper contains 25 sections, 18 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of our motivation and the paradigm comparison between FashionDPO and the supervised learning methods, where FashionDPO optimizes the model using feedback from multiple experts without relying on labeled dataset, resulting in high diversity while maintaining high quality.
  • Figure 2: The overview of FashionDPO, which consists of three consecutive key modules: 1) Fashion Image Generation without Feedback, 2) Feedback Generation from Multiple Experts, and 3) Model Fine-tuning with Direct Preference Optimization.
  • Figure 3: Epoch-wise comparison of FashionDPO's performance across different fine-tuning epochs. As epochs increase, the compatibility metric indicates that the generated fashion items better match the incomplete outfit.
  • Figure 4: Model-wise comparison of different models' generative capabilities: PFITB task above the line, GOR task below.
  • Figure 5: We fine-tune our model on a subset with $n$ outfits, where $n \in \{100, 500, 800, 1000, 2000\}$, to explore the impact of varying datase size on model performance. Lines represent models fine-tuned on different subsets, with the x-axis as epochs and the y-axis as the evaluation metric. Bars show the time cost at per epoch (inference, feedback, fine-tuning) for different subsets.
  • ...and 2 more figures