Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu, Shitong Shao, Bao Li, Lichen Bai, Zhiqiang Xu, Haoyi Xiong, James Kwok, Sumi Helal, Zeke Xie
TL;DR
The paper surveys alignment of diffusion models, detailing how misalignment with human preferences arises and how RLHF, DPO, and test-time alignment address it. It draws inspiration from LLM alignment, surveys data, algorithms, benchmarks, and cross-domain extensions to video, audio, 3D, and scientific applications, and highlights the trade-offs between training-based and test-time approaches. Key contributions include a structured taxonomy of alignment techniques, a compilation of benchmarks and evaluation metrics, and a forward-looking discussion of challenges and future directions such as pluralistic feedback, data-efficient learning, and self-alignment concepts. The work emphasizes the practical impact of reliable, safe, and controllable diffusion models across domains and modalities, guiding researchers and engineers toward robust, human-aligned generative systems.
Abstract
Diffusion models have emerged as the leading paradigm in generative modeling, excelling in various applications. Despite their success, these models often misalign with human intentions and generate results with undesired properties or even harmful content. Inspired by the success and popularity of alignment in tuning large language models, recent studies have investigated aligning diffusion models with human expectations and preferences. This work mainly reviews alignment of diffusion models, covering advancements in fundamentals of alignment, alignment techniques of diffusion models, preference benchmarks, and evaluation for diffusion models. Moreover, we discuss key perspectives on current challenges and promising future directions on solving the remaining challenges in alignment of diffusion models. To the best of our knowledge, our work is the first comprehensive review paper for researchers and engineers to comprehend, practice, and research alignment of diffusion models.
