Table of Contents
Fetching ...

Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing

Sihao Wu, Xiaonan Si, Chi Xing, Jianhong Wang, Gaojie Jin, Guangliang Cheng, Lijun Zhang, Xiaowei Huang

TL;DR

The paper addresses how to align diffusion-model outputs with human preferences in image generation and editing. It surveys optimization approaches such as RLHF and Direct Preference Optimization (DPO), detailing their training signals, data strategies, and modality considerations. It reviews applications across medical imaging, robotics, autonomous driving, and other domains, highlighting domain-specific fine-tuning practices and practical impact. It also discusses key challenges like computational cost, data collection, and reward design, and suggests directions including LoRA-based fine-tuning, multimodal integration with LLMs/VLMs, and safety considerations to guide future work.

Abstract

The integration of preference alignment with diffusion models (DMs) has emerged as a transformative approach to enhance image generation and editing capabilities. Although integrating diffusion models with preference alignment strategies poses significant challenges for novices at this intersection, comprehensive and systematic reviews of this subject are still notably lacking. To bridge this gap, this paper extensively surveys preference alignment with diffusion models in image generation and editing. First, we systematically review cutting-edge optimization techniques such as reinforcement learning with human feedback (RLHF), direct preference optimization (DPO), and others, highlighting their pivotal role in aligning preferences with DMs. Then, we thoroughly explore the applications of aligning preferences with DMs in autonomous driving, medical imaging, robotics, and more. Finally, we comprehensively discuss the challenges of preference alignment with DMs. To our knowledge, this is the first survey centered on preference alignment with DMs, providing insights to drive future innovation in this dynamic area.

Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing

TL;DR

The paper addresses how to align diffusion-model outputs with human preferences in image generation and editing. It surveys optimization approaches such as RLHF and Direct Preference Optimization (DPO), detailing their training signals, data strategies, and modality considerations. It reviews applications across medical imaging, robotics, autonomous driving, and other domains, highlighting domain-specific fine-tuning practices and practical impact. It also discusses key challenges like computational cost, data collection, and reward design, and suggests directions including LoRA-based fine-tuning, multimodal integration with LLMs/VLMs, and safety considerations to guide future work.

Abstract

The integration of preference alignment with diffusion models (DMs) has emerged as a transformative approach to enhance image generation and editing capabilities. Although integrating diffusion models with preference alignment strategies poses significant challenges for novices at this intersection, comprehensive and systematic reviews of this subject are still notably lacking. To bridge this gap, this paper extensively surveys preference alignment with diffusion models in image generation and editing. First, we systematically review cutting-edge optimization techniques such as reinforcement learning with human feedback (RLHF), direct preference optimization (DPO), and others, highlighting their pivotal role in aligning preferences with DMs. Then, we thoroughly explore the applications of aligning preferences with DMs in autonomous driving, medical imaging, robotics, and more. Finally, we comprehensively discuss the challenges of preference alignment with DMs. To our knowledge, this is the first survey centered on preference alignment with DMs, providing insights to drive future innovation in this dynamic area.

Paper Structure

This paper contains 16 sections, 13 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Preference Alignment on DM Framework: The process begins with a prompt model encoding a prompt into an embedding, and a source image is transformed into a latent representation $z_t$. DMs fine-tuned with RLHF, DPO, and SFT, process these embeddings to generate new images or edit existing ones while preserving structural integrity, ensuring outputs align with user preferences.
  • Figure 2: General paradigm of preference alignment for various condition modules on DM applications.