Preference Alignment for Diffusion Model via Explicit Denoised Distribution Estimation
Dingyuan Shi, Yong Wang, Hangyu Li, Xiangxiang Chu
TL;DR
This work introduces Denoised Distribution Estimation (DDE), a direct preference optimization framework for diffusion models that overcomes the terminal-only labeling challenge by explicitly linking intermediate denoising steps to the terminal distribution $p_\theta(x_0)$. It proposes two complementary estimation strategies—stepwise estimation for the upper trajectory segment and single-shot DDIM-based estimation for the final segment—together forming a unified loss that naturally assigns credit to the middle denoising steps. The method is shown to be effective and efficient, achieving state-of-the-art quantitative and qualitative results on SD15 and SDXL without auxiliary reward models. The findings highlight a principled way to prioritize middle-trajectory optimization, with broad implications for preference alignment in diffusion-based generation systems.
Abstract
Diffusion models have shown remarkable success in text-to-image generation, making preference alignment for these models increasingly important. The preference labels are typically available only at the terminal of denoising trajectories, which poses challenges in optimizing the intermediate denoising steps. In this paper, we propose to conduct Denoised Distribution Estimation (DDE) that explicitly connects intermediate steps to the terminal denoised distribution. Therefore, preference labels can be used for the entire trajectory optimization. To this end, we design two estimation strategies for our DDE. The first is stepwise estimation, which utilizes the conditional denoised distribution to estimate the model denoised distribution. The second is single-shot estimation, which converts the model output into the terminal denoised distribution via DDIM modeling. Analytically and empirically, we reveal that DDE equipped with two estimation strategies naturally derives a novel credit assignment scheme that prioritizes optimizing the middle part of the denoising trajectory. Extensive experiments demonstrate that our approach achieves superior performance, both quantitatively and qualitatively.
