Table of Contents
Fetching ...

BalancedDPO: Adaptive Multi-Metric Alignment

Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, Vaneet Aggarwal

TL;DR

BalancedDPO tackles the challenge of aligning text-to-image diffusion models to multiple user preferences by moving beyond single-metric optimization. It introduces a majority voting-based aggregation of multi-metric feedback within the Direct Preference Optimization framework and couples this with dynamic reference-model updates to stabilize learning. The approach yields state-of-the-art results on Pick-a-Pic, PartiPrompts, and HPD, delivering balanced improvements across Human Preference Score, CLIP, PickScore, and Aesthetic metrics, and exhibits strong robustness to seed variation and out-of-distribution prompts. This multi-metric, consensus-driven alignment is practical for real-world image generation, enabling more faithful prompt adherence and higher perceived visual quality without substantial changes to the standard DPO pipeline.

Abstract

Text-to-image (T2I) diffusion models have made remarkable advancements, yet aligning them with diverse preferences remains a persistent challenge. Current methods often optimize single metrics or depend on narrowly curated datasets, leading to overfitting and limited generalization across key visual quality metrics. We present BalancedDPO, a novel extension of Direct Preference Optimization (DPO) that addresses these limitations by simultaneously aligning T2I diffusion models with multiple metrics, including human preference, CLIP score, and aesthetic quality. Our key novelty lies in aggregating consensus labels from diverse metrics in the preference distribution space as compared to existing reward mixing approaches, enabling robust and scalable multi-metric alignment while maintaining the simplicity of the standard DPO pipeline that we refer to as BalancedDPO. Our evaluations on the Pick-a-Pic, PartiPrompt and HPD datasets show that BalancedDPO achieves state-of-the-art results, outperforming existing approaches across all major metrics. BalancedDPO improves the average win rates by 15%, 7.1%, and 10.3% on Pick-a-pic, PartiPrompt and HPD, respectively, from the DiffusionDPO.

BalancedDPO: Adaptive Multi-Metric Alignment

TL;DR

BalancedDPO tackles the challenge of aligning text-to-image diffusion models to multiple user preferences by moving beyond single-metric optimization. It introduces a majority voting-based aggregation of multi-metric feedback within the Direct Preference Optimization framework and couples this with dynamic reference-model updates to stabilize learning. The approach yields state-of-the-art results on Pick-a-Pic, PartiPrompts, and HPD, delivering balanced improvements across Human Preference Score, CLIP, PickScore, and Aesthetic metrics, and exhibits strong robustness to seed variation and out-of-distribution prompts. This multi-metric, consensus-driven alignment is practical for real-world image generation, enabling more faithful prompt adherence and higher perceived visual quality without substantial changes to the standard DPO pipeline.

Abstract

Text-to-image (T2I) diffusion models have made remarkable advancements, yet aligning them with diverse preferences remains a persistent challenge. Current methods often optimize single metrics or depend on narrowly curated datasets, leading to overfitting and limited generalization across key visual quality metrics. We present BalancedDPO, a novel extension of Direct Preference Optimization (DPO) that addresses these limitations by simultaneously aligning T2I diffusion models with multiple metrics, including human preference, CLIP score, and aesthetic quality. Our key novelty lies in aggregating consensus labels from diverse metrics in the preference distribution space as compared to existing reward mixing approaches, enabling robust and scalable multi-metric alignment while maintaining the simplicity of the standard DPO pipeline that we refer to as BalancedDPO. Our evaluations on the Pick-a-Pic, PartiPrompt and HPD datasets show that BalancedDPO achieves state-of-the-art results, outperforming existing approaches across all major metrics. BalancedDPO improves the average win rates by 15%, 7.1%, and 10.3% on Pick-a-pic, PartiPrompt and HPD, respectively, from the DiffusionDPO.

Paper Structure

This paper contains 28 sections, 15 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: This figure illustrates the scoring and consensus-based decision-making process used by our method (BalancedDPO ) to determine the winner in a pair of images generated for the prompt "A cyberpunk golden retriever is coding." Scorers ($S_1$ to $S_n$) evaluate each image based on specific criteria, such as alignment with the prompt (e.g., $S_1$) or aesthetic appeal (e.g., $S_2$). Each scorer casts a vote for the image they find superior. The image with the majority of votes is declared the winner. This approach mitigates inherent biases caused by scorers using different scoring scales (e.g., one scorer uses a range of 1–100 while another uses 0–1), where normalization is challenging, as demonstrated in our experiments.
  • Figure 2: Comparison of images generated by models trained on image-text pairs from the Pick-a-Pic dataset and preference labels based on different score metrics (Aesthetics schuhmann2022laionaesthetics, CLIP ramesh2022hierarchical, HPS hpsv2, PickScore kirstain2023pick, Pick-a-Pic labels kirstain2023pick), and BalancedDPO (combining all metrics) across four prompts:"a boy playing chess", "A Pixar style blue rabbit", "person riding a shark", and "pirate guinea pig". Results show that single-metric models often fail in either aesthetics or prompt alignment. For instance, the Aesthetics model generates a cartoonish "shark rider," while Pick-a-Pic labels produce a realistic but incomplete image. In contrast, BalancedDPO achieves superior performance across all cases, highlighting the benefit of multi-metric optimization.
  • Figure 3: Pick-a-Pic kirstain2023pick comparison for SDXL. Comparison of images generated by various SDXL fine-tunes (specify versions if applicable) and BalancedDPO (Ours) on the Pick-a-Pic dataset across diverse prompts. BalancedDPO generally creates images that are more realistic and possess finer details. They are also superior in terms of prompt alignment and visual attractiveness. The columns emphasize BalancedDPO 's strength in areas like background details, object details, prompt adherence, and overall aesthetic. The values displayed underneath each image indicate PickScore (P), Human Preference Score (H), CLIP score (C), and Aesthetic score (A).
  • Figure 4: Pick-a-Pic kirstain2023pick comparison. Comparison of images generated by SD1.5, DiffusionDPO, and BalancedDPO (Ours) across various prompts. BalancedDPO consistently produces more realistic and detailed outputs, outperforming the other models in aligning with prompts and visual appeal. Each column highlights BalancedDPO 's superior performance in aspects like facial detail, dynamic motion, adherence to prompt details, and image reflection. The scores below each image represent PickScore (P), Human Preference Score (H), CLIP score (C), and Aesthetic score (A).
  • Figure 5: PartiPrompt partiprompts comparison. Comparison of images generated by SD1.5, DiffusionDPO, and BalancedDPO (Ours) on out-of-distribution prompts from the PartiPrompt dataset. BalancedDPO consistently generates more accurate and realistic outputs, including specific elements like a helicopter, microphone, and lifelike dog, while the other models produce incomplete or irrelevant results.
  • ...and 6 more figures