DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization

Shamil Ayupov; Maksim Nakhodnov; Anastasia Yaschenko; Andrey Kuznetsov; Aibek Alanov

DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization

Shamil Ayupov, Maksim Nakhodnov, Anastasia Yaschenko, Andrey Kuznetsov, Aibek Alanov

TL;DR

This work addresses the fidelity–prompt alignment trade-off in personalized diffusion by adapting Direct Preference Optimization (DPO) to automatically generate better–worse pairs from model outputs, using external quality metrics to avoid manual labeling. A novel angle-based filtering and multi-step training scheme directs updates toward desired regions of the trade-off space, enabling controllable emphasis on concept fidelity, prompt adherence, or a balanced mix. Empirical results across SD2 and SDXL backbones show improvements in both Image Similarity and Text Similarity, surpassing the Pareto frontier and proving robust to different backbones, with a confirming user study. The approach offers a scalable, automated, and tunable framework for personalized diffusion that can be deployed with modest additional computational costs relative to standard fine-tuning pipelines.

Abstract

Personalized diffusion models have shown remarkable success in Text-to-Image (T2I) generation by enabling the injection of user-defined concepts into diverse contexts. However, balancing concept fidelity with contextual alignment remains a challenging open problem. In this work, we propose an RL-based approach that leverages the diverse outputs of T2I models to address this issue. Our method eliminates the need for human-annotated scores by generating a synthetic paired dataset for DPO-like training using external quality metrics. These better-worse pairs are specifically constructed to improve both concept fidelity and prompt adherence. Moreover, our approach supports flexible adjustment of the trade-off between image fidelity and textual alignment. Through multi-step training, our approach outperforms a naive baseline in convergence speed and output quality. We conduct extensive qualitative and quantitative analysis, demonstrating the effectiveness of our method across various architectures and fine-tuning techniques. The source code can be found at https://github.com/ControlGenAI/DreamBoothDPO.

DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization

TL;DR

Abstract

DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)