Table of Contents
Fetching ...

What does guidance do? A fine-grained analysis in a simple setting

Muthu Chidambaram, Khashayar Gatmiry, Sitan Chen, Holden Lee, Jianfeng Lu

TL;DR

The work provides a rigorous, fine-grained analysis showing that diffusion guidance does not sample from the intended tilted distribution. It characterizes the guided probability flow ODE in two simple mixture settings—compactly supported distributions and Gaussians—demonstrating that large guidance concentrates samples at the edges of the conditional support and can degrade quality when score estimation errors are present. The authors also derive practical prescriptions for choosing guidance strength and validate the theory with synthetic experiments and image data (MNIST, ImageNet). This clarifies the limitations and geometry of guidance, offering a principled basis for its use in controlled generation. Overall, the paper exposes the geometric dynamics underlying guidance and provides guidance for robust deployment.

Abstract

The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by rigorously proving that guidance fails to sample from the intended tilted distribution. Our main result is to give a fine-grained characterization of the dynamics of guidance in two cases, (1) mixtures of compactly supported distributions and (2) mixtures of Gaussians, which reflect salient properties of guidance that manifest on real-world data. In both cases, we prove that as the guidance parameter increases, the guided model samples more heavily from the boundary of the support of the conditional distribution. We also prove that for any nonzero level of score estimation error, sufficiently large guidance will result in sampling away from the support, theoretically justifying the empirical finding that large guidance results in distorted generations. In addition to verifying these results empirically in synthetic settings, we also show how our theoretical insights can offer useful prescriptions for practical deployment.

What does guidance do? A fine-grained analysis in a simple setting

TL;DR

The work provides a rigorous, fine-grained analysis showing that diffusion guidance does not sample from the intended tilted distribution. It characterizes the guided probability flow ODE in two simple mixture settings—compactly supported distributions and Gaussians—demonstrating that large guidance concentrates samples at the edges of the conditional support and can degrade quality when score estimation errors are present. The authors also derive practical prescriptions for choosing guidance strength and validate the theory with synthetic experiments and image data (MNIST, ImageNet). This clarifies the limitations and geometry of guidance, offering a principled basis for its use in controlled generation. Overall, the paper exposes the geometric dynamics underlying guidance and provides guidance for robust deployment.

Abstract

The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by rigorously proving that guidance fails to sample from the intended tilted distribution. Our main result is to give a fine-grained characterization of the dynamics of guidance in two cases, (1) mixtures of compactly supported distributions and (2) mixtures of Gaussians, which reflect salient properties of guidance that manifest on real-world data. In both cases, we prove that as the guidance parameter increases, the guided model samples more heavily from the boundary of the support of the conditional distribution. We also prove that for any nonzero level of score estimation error, sufficiently large guidance will result in sampling away from the support, theoretically justifying the empirical finding that large guidance results in distorted generations. In addition to verifying these results empirically in synthetic settings, we also show how our theoretical insights can offer useful prescriptions for practical deployment.
Paper Structure (28 sections, 17 theorems, 133 equations, 18 figures)

This paper contains 28 sections, 17 theorems, 133 equations, 18 figures.

Key Result

Theorem 1

Consider a data distribution $p=\frac{1}{2} p^{(1)} + \frac{1}{2} p^{(-1)}$ where $p^{(1)}, p^{(-1)}$ are $\beta$-bounded and supported on disjoint intervals $[\alpha_1,\alpha_2]$ and $[-\alpha_2,-\alpha_1]$ respectively (see Assumption assump:ratio). Suppose that one runs the probability flow ODE w where the $O(\cdot)$ notation hides constants depending on $\alpha_1,\alpha_2,\beta$.

Figures (18)

  • Figure 1: We consider sampling from the positive class of a 2D mixture of uniforms (a) using the probability flow ODE with the conditional score (b) and the guided score (c). As can be seen, increasing the guidance weight $w$ clearly biases the distribution of samples to concentrate towards points far away from the other class support.
  • Figure 2: Final samples and mean sampling trajectories produced from solving the probability flow ODE guided towards $y = +1$ in the distribution of Figure \ref{['fig:simpleguide']}. The proportion of good samples (i.e. those that were correctly in the class support) is shown with each sample plot, and a 1 standard deviation band is shown around each mean trajectory.
  • Figure 3: Final samples and mean projected trajectories produced from sampling using the classifier-free guidance model of Pearce2024TeaPearce. For the positive class, we fix the digit to be 0, and the negative class corresponds to all other digits. Each row of samples from top to bottom corresponds to increasing guidance values.
  • Figure 4: Final samples and mean projected trajectories produced from sampling using the classifier-guided ImageNet diffusion model of dhariwal2021diffusion. The positive class here is taken to be 292 (tiger). As before, each row of samples from top to bottom corresponds to increasing guidance values.
  • Figure 5: Mixture of Gaussians analogue to Figure \ref{['fig:synthguidetraj']}. Proportion of good samples corresponds to non-diverged samples. Some trajectories explode due to numerical instability, leading to less meaningful mean projected trajectory plots.
  • ...and 13 more figures

Theorems & Definitions (31)

  • Theorem 1: Compactly supported setting, informal -- see Theorem \ref{['thm:convergencecompact']}
  • Theorem 2: Gaussian setting
  • Theorem 3
  • Theorem 4: Convergence to the edge of the support
  • Lemma 1: Alternative view on probability flow ODE
  • proof
  • Lemma 2: Positive push toward right
  • proof
  • Lemma 3: First phase
  • proof
  • ...and 21 more