Table of Contents
Fetching ...

Towards a unified framework for guided diffusion models

Yuchen Jiao, Yuxin Chen, Gen Li

TL;DR

This work unifies diffusion guidance (notably classifier-free guidance) and reward-guided diffusion under a single framework, introducing a reward-guidance term derived from the difference between original and reward-reweighted scores to steer diffusion backwards. The authors provide a continuous-time SDE analysis with theoretical guarantees showing that guided samplers improve a defined reward metric (and reduce a related cost) relative to unguided baselines, while also enabling a trajectory-free training approach for reward guidance via a denoising score-matching objective. They instantiate the framework to CFG, deriving a theoretical characterization in terms of the reciprocal classifier probability and linking CFG’s effectiveness to a broader cost-reduction view. Numerical experiments on Gaussian mixtures and ImageNet corroborate the theory for CFG, and additional experiments on synthetic data (GMM, Swiss roll) demonstrate the practicality and flexibility of the reward-guided sampler, including training efficiency and robustness to score estimation errors.

Abstract

Guided or controlled data generation with diffusion models\blfootnote{Partial preliminary results of this work appeared in International Conference on Machine Learning 2025 \citep{li2025provable}.} has become a cornerstone of modern generative modeling. Despite substantial advances in diffusion model theory, the theoretical understanding of guided diffusion samplers remains severely limited. We make progress by developing a unified algorithmic and theoretical framework that accommodates both diffusion guidance and reward-guided diffusion. Aimed at fine-tuning diffusion models to improve certain rewards, we propose injecting a reward guidance term -- constructed from the difference between the original and reward-reweighted scores -- into the backward diffusion process, and rigorously quantify the resulting reward improvement over the unguided counterpart. As a key application, our framework shows that classifier-free guidance (CFG) decreases the expected reciprocal of the classifier probability, providing the first theoretical characterization of the specific performance metric that CFG improves for general target distributions. When applied to reward-guided diffusion, our framework yields a new sampler that is easy-to-train and requires no full diffusion trajectories during training. Numerical experiments further corroborate our theoretical findings.

Towards a unified framework for guided diffusion models

TL;DR

This work unifies diffusion guidance (notably classifier-free guidance) and reward-guided diffusion under a single framework, introducing a reward-guidance term derived from the difference between original and reward-reweighted scores to steer diffusion backwards. The authors provide a continuous-time SDE analysis with theoretical guarantees showing that guided samplers improve a defined reward metric (and reduce a related cost) relative to unguided baselines, while also enabling a trajectory-free training approach for reward guidance via a denoising score-matching objective. They instantiate the framework to CFG, deriving a theoretical characterization in terms of the reciprocal classifier probability and linking CFG’s effectiveness to a broader cost-reduction view. Numerical experiments on Gaussian mixtures and ImageNet corroborate the theory for CFG, and additional experiments on synthetic data (GMM, Swiss roll) demonstrate the practicality and flexibility of the reward-guided sampler, including training efficiency and robustness to score estimation errors.

Abstract

Guided or controlled data generation with diffusion models\blfootnote{Partial preliminary results of this work appeared in International Conference on Machine Learning 2025 \citep{li2025provable}.} has become a cornerstone of modern generative modeling. Despite substantial advances in diffusion model theory, the theoretical understanding of guided diffusion samplers remains severely limited. We make progress by developing a unified algorithmic and theoretical framework that accommodates both diffusion guidance and reward-guided diffusion. Aimed at fine-tuning diffusion models to improve certain rewards, we propose injecting a reward guidance term -- constructed from the difference between the original and reward-reweighted scores -- into the backward diffusion process, and rigorously quantify the resulting reward improvement over the unguided counterpart. As a key application, our framework shows that classifier-free guidance (CFG) decreases the expected reciprocal of the classifier probability, providing the first theoretical characterization of the specific performance metric that CFG improves for general target distributions. When applied to reward-guided diffusion, our framework yields a new sampler that is easy-to-train and requires no full diffusion trajectories during training. Numerical experiments further corroborate our theoretical findings.

Paper Structure

This paper contains 82 sections, 8 theorems, 236 equations, 5 figures, 1 table.

Key Result

Lemma 1

Consider any $0\le \delta<1$. For any $\tau$ and $t$ obeying $0 \le \tau \le t \le 1-\delta$, one has Moreover, if $Y_{\delta} \sim p_{X_{1-\delta}}$, then it holds that

Figures (5)

  • Figure 1: Experimental results under the Gaussian mixture model. (Left) Proportions of samples with improved classifier probabilities; (Right) Averages of $-p_{c\,|\, X_0}(1\,|\, Y_1^w)^{-1}$ for varying guidance scales $w$.
  • Figure 2: Experimental results on the ImageNet dataset. (Left) Proportions of samples with improved classifier probabilities; (Right) Averages of $-p_{c\,|\, X_0}(1\,|\, Y_1^w)^{-1}$ for varying guidance scales $w$.
  • Figure 3: Empirical distributions of reward-guided sampler with specific values of $w$ vs. DDPM ($w=0$).
  • Figure 4: The target distribution is assumed to be the uniform distribution over the set of points shown in the figure.
  • Figure 5: Generated samples of reward-guided sampler with specific values of $w$ and DDPM ($w=0$).

Theorems & Definitions (9)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • Corollary 1: Effectiveness of classifier-free diffusion guidance
  • Corollary 2: Effectiveness of reward-guided diffusion models
  • Remark 1
  • Lemma 3
  • Theorem 3