Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

Yongjin Yang; Sihyeon Kim; Hojung Jung; Sangmin Bae; SangMook Kim; Se-Young Yun; Kimin Lee

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

Yongjin Yang, Sihyeon Kim, Hojung Jung, Sangmin Bae, SangMook Kim, Se-Young Yun, Kimin Lee

TL;DR

FiFA presents an automated data-filtering framework for efficiently aligning text-to-image diffusion models with human preferences by selecting informative data via a triad of signals: preference margin, text quality, and text diversity. It leverages a proxy reward model to estimate margins, an LLM to score prompt quality, and a $k$-NN based entropy proxy to encourage diversity, integrating these into a tractable objective that selects the top-K data. Experiments on SD1.5 and SDXL show FiFA achieves faster convergence and better human-perceived quality using far less data and GPU hours, while reducing harmful outputs. The approach advances practical alignment for large-scale diffusion models and points to extensions to online DPO and policy-gradient methods.

Abstract

Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion models using human feedback datasets with direct preference optimization (DPO). Specifically, our approach selects data by solving an optimization problem to maximize three components: preference margin, text quality, and text diversity. The concept of preference margin is used to identify samples that are highly informative in addressing the noisy nature of feedback dataset, which is calculated using a proxy reward model. Additionally, we incorporate text quality, assessed by large language models to prevent harmful contents, and consider text diversity through a k-nearest neighbor entropy estimator to improve generalization. Finally, we integrate all these components into an optimization process, with approximating the solution by assigning importance score to each data pair and selecting the most important ones. As a result, our method efficiently filters data automatically, without the need for manual intervention, and can be applied to any large-scale dataset. Experimental results show that FiFA significantly enhances training stability and achieves better performance, being preferred by humans 17% more, while using less than 0.5% of the full data and thus 1% of the GPU hours compared to utilizing full human feedback datasets.

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

TL;DR

Abstract

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (2)