Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
Min Cheng, Fatemeh Doudi, Dileep Kalathil, Mohammad Ghavamzadeh, Panganamala R. Kumar
TL;DR
Diffusion Blend introduces inference-time multi-preference alignment for diffusion models by blending backward diffusion processes corresponding to basis rewards. Two algorithms, DB-MPA (multi-reward) and DB-KLA (KL-regularization control), leverage a Jensen-gap-based approximation to express the target drift as a linear combination of basis drifts, enabling user-specified $r(w)$ with $\alpha(\lambda)$ at inference without additional fine-tuning. Experimental results on SDv1.5 with multiple rewards show DB-MPA and DB-KLA outperform baselines and closely approach MORL oracle performance, while offering smooth, real-time control over outputs. This framework reduces computational cost and enables personalized, policy-driven diffusion generation, with memory overhead as a noted area for future efficiency improvements.
Abstract
Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed KL regularization. However, this approach is inherently restrictive in practice, where alignment must balance multiple, often conflicting objectives. Moreover, user preferences vary across prompts, individuals, and deployment contexts, with varying tolerances for deviation from a pre-trained base model. We address the problem of inference-time multi-preference alignment: given a set of basis reward functions and a reference KL regularization strength, can we design a fine-tuning procedure so that, at inference time, it can generate images aligned with any user-specified linear combination of rewards and regularization, without requiring additional fine-tuning? We propose Diffusion Blend, a novel approach to solve inference-time multi-preference alignment by blending backward diffusion processes associated with fine-tuned models, and we instantiate this approach with two algorithms: DB-MPA for multi-reward alignment and DB-KLA for KL regularization control. Extensive experiments show that Diffusion Blend algorithms consistently outperform relevant baselines and closely match or exceed the performance of individually fine-tuned models, enabling efficient, user-driven alignment at inference-time. The code is available at https://github.com/bluewoods127/DB-2025}{github.com/bluewoods127/DB-2025.
