Table of Contents
Fetching ...

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

Yongjin Yang, Sihyeon Kim, Hojung Jung, Sangmin Bae, SangMook Kim, Se-Young Yun, Kimin Lee

TL;DR

FiFA presents an automated data-filtering framework for efficiently aligning text-to-image diffusion models with human preferences by selecting informative data via a triad of signals: preference margin, text quality, and text diversity. It leverages a proxy reward model to estimate margins, an LLM to score prompt quality, and a $k$-NN based entropy proxy to encourage diversity, integrating these into a tractable objective that selects the top-K data. Experiments on SD1.5 and SDXL show FiFA achieves faster convergence and better human-perceived quality using far less data and GPU hours, while reducing harmful outputs. The approach advances practical alignment for large-scale diffusion models and points to extensions to online DPO and policy-gradient methods.

Abstract

Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion models using human feedback datasets with direct preference optimization (DPO). Specifically, our approach selects data by solving an optimization problem to maximize three components: preference margin, text quality, and text diversity. The concept of preference margin is used to identify samples that are highly informative in addressing the noisy nature of feedback dataset, which is calculated using a proxy reward model. Additionally, we incorporate text quality, assessed by large language models to prevent harmful contents, and consider text diversity through a k-nearest neighbor entropy estimator to improve generalization. Finally, we integrate all these components into an optimization process, with approximating the solution by assigning importance score to each data pair and selecting the most important ones. As a result, our method efficiently filters data automatically, without the need for manual intervention, and can be applied to any large-scale dataset. Experimental results show that FiFA significantly enhances training stability and achieves better performance, being preferred by humans 17% more, while using less than 0.5% of the full data and thus 1% of the GPU hours compared to utilizing full human feedback datasets.

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

TL;DR

FiFA presents an automated data-filtering framework for efficiently aligning text-to-image diffusion models with human preferences by selecting informative data via a triad of signals: preference margin, text quality, and text diversity. It leverages a proxy reward model to estimate margins, an LLM to score prompt quality, and a -NN based entropy proxy to encourage diversity, integrating these into a tractable objective that selects the top-K data. Experiments on SD1.5 and SDXL show FiFA achieves faster convergence and better human-perceived quality using far less data and GPU hours, while reducing harmful outputs. The approach advances practical alignment for large-scale diffusion models and points to extensions to online DPO and policy-gradient methods.

Abstract

Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion models using human feedback datasets with direct preference optimization (DPO). Specifically, our approach selects data by solving an optimization problem to maximize three components: preference margin, text quality, and text diversity. The concept of preference margin is used to identify samples that are highly informative in addressing the noisy nature of feedback dataset, which is calculated using a proxy reward model. Additionally, we incorporate text quality, assessed by large language models to prevent harmful contents, and consider text diversity through a k-nearest neighbor entropy estimator to improve generalization. Finally, we integrate all these components into an optimization process, with approximating the solution by assigning importance score to each data pair and selecting the most important ones. As a result, our method efficiently filters data automatically, without the need for manual intervention, and can be applied to any large-scale dataset. Experimental results show that FiFA significantly enhances training stability and achieves better performance, being preferred by humans 17% more, while using less than 0.5% of the full data and thus 1% of the GPU hours compared to utilizing full human feedback datasets.

Paper Structure

This paper contains 53 sections, 2 theorems, 25 equations, 18 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Denoting $\phi_i(\mathbf{c}) := \phi(\mathbf{x}_{0,i}^{w},\mathbf{c}) - \phi(\mathbf{x}_{0,i}^{l},\mathbf{c})$ with feature vector $\phi$. Define $g$ as: where $V(\pi) := \sum \pi(i,\mathbf{c}) \phi_i(\mathbf{c}) \phi_i(\mathbf{c})^\top$ is the design matrix with $\pi:(i,\mathbf{c})\rightarrow [0,1]$ being a probability distribution. Assume $r_i(\mathbf{c})=\phi_i(\mathbf{c})^\top\mathbf{\theta}_

Figures (18)

  • Figure 1: (a) PickScore kirstain2024pick at each training step of the SD1.5 model using data filtered with FiFA, which uses $0.5\%$ of the data, compared to the model trained with full dataset. Our method significantly outperforms the alternative, converging faster while requiring about 4x fewer GPU hours to match the performance of the SD1.5-DPO released checkpoint. (b) Qualitative evaluation of training on the full data and data selected with our FiFA for various prompts.
  • Figure 2: (a) Qualitative analysis of preference margin estimated through PickScore reward model. (b) Distribution of PickScore reward margins of Pick-a-Pic v2 train set.
  • Figure 3: (a) Examples of harmful outputs when training with the full Pick-a-Pic v2 dataset without considering the quality of text prompts. (b) LLM score and diversity measures of text prompts from subsets of the full Pick-a-Pic v2 dataset using three metrics: word entropy (calculating the entropy of words), semantic diversity (measuring average cosine similarity of embedded text prompts), and singular entropy (entropy of the singular values of the embedded text matrix). When modifying either $\alpha$ or $\gamma$, the other value is fixed at $0$.
  • Figure 4: Human evaluation results. We compare SDXL trained with FiFA against SDXL trained on the full dataset using the HPSv2 benchmark. The SDXL model with FiFA consistently outperforms the SDXL model with the full dataset in terms of both aesthetic quality and text-image alignment, leading to superior overall quality.
  • Figure 5: Samples from the HPSv2 benchmark, generated using a pretrained model, the model trained on the full dataset (DPO + Full), and the model trained using FiFA (DPO + FiFA). Images from the DPO+FiFA model show better alignment to the prompts and higher quality than the others.
  • ...and 13 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2: Kiefer-Wolfowitz