Table of Contents
Fetching ...

Generative Diffusion Modeling: A Practical Handbook

Zihan Ding, Chi Jin

TL;DR

This handbook provides a practical, notation-aligned synthesis of diffusion-model families, unifying diffusion probabilistic models, score-based denoising, consistency models, rectified flow, and TrigFlow under a common framework. It emphasizes bridging the paper-to-code gap through standardized formulations, training objectives, and inference procedures, while outlining post-training techniques such as distillation and reward-based fine-tuning. The work clarifies relationships among methods via unified formulations, velocity mappings, and parameterization dualities, enabling robust implementations and fair comparisons. By focusing on pre-training, distillation, and task-specific fine-tuning, the handbook offers actionable guidance for building scalable, high-quality generative models across images, audio, video, and 3D content.

Abstract

This handbook offers a unified perspective on diffusion models, encompassing diffusion probabilistic models, score-based generative models, consistency models, rectified flow, and related methods. By standardizing notations and aligning them with code implementations, it aims to bridge the "paper-to-code" gap and facilitate robust implementations and fair comparisons. The content encompasses the fundamentals of diffusion models, the pre-training process, and various post-training methods. Post-training techniques include model distillation and reward-based fine-tuning. Designed as a practical guide, it emphasizes clarity and usability over theoretical depth, focusing on widely adopted approaches in generative modeling with diffusion models.

Generative Diffusion Modeling: A Practical Handbook

TL;DR

This handbook provides a practical, notation-aligned synthesis of diffusion-model families, unifying diffusion probabilistic models, score-based denoising, consistency models, rectified flow, and TrigFlow under a common framework. It emphasizes bridging the paper-to-code gap through standardized formulations, training objectives, and inference procedures, while outlining post-training techniques such as distillation and reward-based fine-tuning. The work clarifies relationships among methods via unified formulations, velocity mappings, and parameterization dualities, enabling robust implementations and fair comparisons. By focusing on pre-training, distillation, and task-specific fine-tuning, the handbook offers actionable guidance for building scalable, high-quality generative models across images, audio, video, and 3D content.

Abstract

This handbook offers a unified perspective on diffusion models, encompassing diffusion probabilistic models, score-based generative models, consistency models, rectified flow, and related methods. By standardizing notations and aligning them with code implementations, it aims to bridge the "paper-to-code" gap and facilitate robust implementations and fair comparisons. The content encompasses the fundamentals of diffusion models, the pre-training process, and various post-training methods. Post-training techniques include model distillation and reward-based fine-tuning. Designed as a practical guide, it emphasizes clarity and usability over theoretical depth, focusing on widely adopted approaches in generative modeling with diffusion models.

Paper Structure

This paper contains 74 sections, 3 theorems, 142 equations, 14 figures, 10 algorithms.

Key Result

Theorem 2.2

song2023consistency Define a maximum of timestep interval $\Delta t=\max_{n\in[N-1]}|t_{n+1}-t_n|$, under certain Lipschitz and smooth conditions, and assuming ground-truth teacher score model in CD, we have

Figures (14)

  • Figure 1: Relationship of $v$-prediction in diffusion and InstaFlow-prediction in rectified flow
  • Figure 2: Velocity mapping from $v$-prediction in diffusion to InstaFlow-prediction in rectified flow
  • Figure 3: The "triangular"-formula in diffusion model for three parameterization: $\epsilon$-prediction, $x$-prediction and $v$-prediction.
  • Figure 4: The "triangular"-formula in rectified flow model for three parameterization: $\epsilon$-prediction, $x$-prediction and $v$-prediction.
  • Figure 5: $\epsilon$-prediction loss along the diffusion trajectory.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Definition 2.1: Diffusion Model
  • Theorem 2.2
  • Proposition 2.3
  • Theorem 3.1: Global optimum of VSD