Table of Contents
Fetching ...

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

Kyuyoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dvijotham, Jinwoo Shin, Kimin Lee

TL;DR

This work tackles reward overoptimization in fine-tuning text-to-image models using human feedback by introducing the TIA2 benchmark to rigorously evaluate reward-model alignment with human judgments. It proposes TextNorm, a confidence-calibrated reward method built on a contrastive prompt set and ensembles to mitigate misalignment and improve robustness during optimization. Extensive experiments with best-of-n sampling, supervised fine-tuning, and reinforcement learning show TextNorm substantially improves text-image alignment in human evaluations, with some trade-offs in image fidelity under RL. The results suggest that confidence-aware reward design can make reward-based fine-tuning more reliable and less prone to degrading true alignment or image quality.

Abstract

Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent. However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models, a phenomenon known as reward overoptimization. To investigate this issue in depth, we introduce the Text-Image Alignment Assessment (TIA2) benchmark, which comprises a diverse collection of text prompts, images, and human annotations. Our evaluation of several state-of-the-art reward models on this benchmark reveals their frequent misalignment with human assessment. We empirically demonstrate that overoptimization occurs notably when a poorly aligned reward model is used as the fine-tuning objective. To address this, we propose TextNorm, a simple method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts. We demonstrate that incorporating the confidence-calibrated rewards in fine-tuning effectively reduces overoptimization, resulting in twice as many wins in human evaluation for text-image alignment compared against the baseline reward models.

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

TL;DR

This work tackles reward overoptimization in fine-tuning text-to-image models using human feedback by introducing the TIA2 benchmark to rigorously evaluate reward-model alignment with human judgments. It proposes TextNorm, a confidence-calibrated reward method built on a contrastive prompt set and ensembles to mitigate misalignment and improve robustness during optimization. Extensive experiments with best-of-n sampling, supervised fine-tuning, and reinforcement learning show TextNorm substantially improves text-image alignment in human evaluations, with some trade-offs in image fidelity under RL. The results suggest that confidence-aware reward design can make reward-based fine-tuning more reliable and less prone to degrading true alignment or image quality.

Abstract

Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent. However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models, a phenomenon known as reward overoptimization. To investigate this issue in depth, we introduce the Text-Image Alignment Assessment (TIA2) benchmark, which comprises a diverse collection of text prompts, images, and human annotations. Our evaluation of several state-of-the-art reward models on this benchmark reveals their frequent misalignment with human assessment. We empirically demonstrate that overoptimization occurs notably when a poorly aligned reward model is used as the fine-tuning objective. To address this, we propose TextNorm, a simple method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts. We demonstrate that incorporating the confidence-calibrated rewards in fine-tuning effectively reduces overoptimization, resulting in twice as many wins in human evaluation for text-image alignment compared against the baseline reward models.
Paper Structure (26 sections, 5 equations, 10 figures, 19 tables, 1 algorithm)

This paper contains 26 sections, 5 equations, 10 figures, 19 tables, 1 algorithm.

Figures (10)

  • Figure 1: Images generated using Stable Diffusion v2.1 rombach2022high fined-tuned with CLIP radford2021learning, BLIP-2 li2023blip, ImageReward (IR; xu2023imagereward), and PickScore (PS; kirstain2023pick). Both text-image alignment and image fidelity exhibit degradation when subjected to excessive optimization. Our proposed method (TextNorm) demonstrates its robustness against overoptimization, as illustrated in the last column.
  • Figure 1: Statistics on the TIA2 benchmark. The benchmark consists of a total of 550 text prompts, 27,500 images, and a set of three human annotations for every text-image pair. See Appendix \ref{['appendix:benchmark']} for more details on the benchmark.
  • Figure 2: Sample images for which the reward models do not fully agree with human labels.
  • Figure 3: TextNorm normalizes rewards over a set of contrastive prompts generated using an LLM. Combining an ensemble of normalized rewards of multiple models can further enhance alignment.
  • Figure 4: Images sampled using best-of-$n$ for $n \in \{16, 64, 256\}$ with the five reward models.
  • ...and 5 more figures