Table of Contents
Fetching ...

Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms

Krunoslav Lehman Pavasovic, Jakob Verbeek, Giulio Biroli, Marc Mezard

TL;DR

Classifier-Free Guidance (CFG) is analyzed in diffusion and flow-based generative models to understand how conditioning shapes the generated distribution. The authors prove a blessing-of-dimensionality result: in high or infinite dimensions, CFG distortions vanish and the guided samples converge to the target distribution, while introducing a broad family of generalized guidance forms, notably non-linear power-law CFG. They show these generalized guidances retain correct sampling in high dimensions and demonstrate improved fidelity, robustness, and diversity in experiments on class-conditional and text-to-image tasks. The work bridges theory and practice by connecting high-dimensional dynamics to practical guidance strategies that enhance modern diffusion and flow-matching models.

Abstract

Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distribution sharper than the target one, more shifted towards the boundary of the class. In this work, we provide a high-dimensional analysis of CFG, showing that these distortions vanish as the data dimension grows. We present a blessing-of-dimensionality result demonstrating that in sufficiently high and infinite dimensions, CFG accurately reproduces the target distribution. Using our high-dimensional theory, we show that there is a large family of guidances enjoying this property, in particular non-linear CFG generalizations. We study a simple non-linear power-law version, for which we demonstrate improved robustness, sample fidelity and diversity. Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.

Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms

TL;DR

Classifier-Free Guidance (CFG) is analyzed in diffusion and flow-based generative models to understand how conditioning shapes the generated distribution. The authors prove a blessing-of-dimensionality result: in high or infinite dimensions, CFG distortions vanish and the guided samples converge to the target distribution, while introducing a broad family of generalized guidance forms, notably non-linear power-law CFG. They show these generalized guidances retain correct sampling in high dimensions and demonstrate improved fidelity, robustness, and diversity in experiments on class-conditional and text-to-image tasks. The work bridges theory and practice by connecting high-dimensional dynamics to practical guidance strategies that enhance modern diffusion and flow-matching models.

Abstract

Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distribution sharper than the target one, more shifted towards the boundary of the class. In this work, we provide a high-dimensional analysis of CFG, showing that these distortions vanish as the data dimension grows. We present a blessing-of-dimensionality result demonstrating that in sufficiently high and infinite dimensions, CFG accurately reproduces the target distribution. Using our high-dimensional theory, we show that there is a large family of guidances enjoying this property, in particular non-linear CFG generalizations. We study a simple non-linear power-law version, for which we demonstrate improved robustness, sample fidelity and diversity. Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.

Paper Structure

This paper contains 55 sections, 77 equations, 28 figures, 5 tables.

Figures (28)

  • Figure 1: Qualitative comparison of unguided sampling, standard Classifier-Free Guidance (CFG), and our proposed non-linear power-law version (DiT/XL-2 on ImageNet-1K $256\times256$). Standard CFG increases fidelity at a substantial expense to diversity and semantic meaning compared to unguided CFG. Our power-law guidance improves fidelity at no cost to semantics or diversity. Samples in each column start from the same seed.
  • Figure 2: Left: CFG produces the exact target distribution in high dimensions. We simulate the backward process using a two Gaussian mixture. We project and plot the generated samples onto the target mean $+\vec{m}$: $q(t=0)=\vec{x}\cdot\vec{m}/|\vec{m}|$. For small $d=2$, CFG generates a distribution with larger magnitude mean (dashed line) and smaller variance than the target one (for $\omega=0.$). This effect diminishes as the dimension increases: for $d=200$ it is practically absent. Right: High-dimensionality of the data allows CFG trajectories to align. We plot the evolution of the mean of trajectories $q(t)$: starting at large forward times denoted with $t=1$ (noise), for small $d=2$, CFG trajectories do not align with the unconditional trajectories at $t=0$ (data) causing the CFG overshoot. For large dimension $d=200$, the high-dimensionality of the data allows trajectories to realign with the unguided one at speciation time $t_s$, resulting in the correct target distribution.
  • Figure 3: Dynamical regimes in diffusion. Left: Illustration of the speciation phenomenon using a one-dimensional Gaussian mixture. Starting from pure Gaussian noise at large time $t$, the backward diffusion begins in Regime i@, where the class has not been decided yet. After speciation time $t_\textrm{s}$ (dashed line), the class membership is decided. Right: Evolution of the effective potential (conditional potential in Eq. (\ref{['eqn:eff_pot']})) over time for high-dimensional Gaussian mixture showcasing the symmetry breaking phenomenon.
  • Figure 4: Evolution of the CFG score difference, from noise ($t=1$) to data ($t=0$). Left (stand. CFG): Numerically simulating mixture of two Gaussians: as $d$ increases, the score difference becomes substantial earlier (this happens during Regime i@). Middle (non-lin. CFG, $d=200$): Non-linear CFG parameter $\alpha$ allows more flexible behavior of the score difference. Right (stand. CFG): Real-world experiments using advanced models show consistent behavior with theory: monotonically increasing score difference followed by decay after a certain point. Experimental details are provided in App. \ref{['sec:appx_E_gm']} and \ref{['sec:appx_F_real_world']}.
  • Figure 5: Sensitivity analysis (EDM2-S, ImageNet-1K $512\times512$). Left: Increasing parameter $\alpha$ consistently improves FID to standard CFG ($\alpha=0.$). Right: Increasing $\alpha$ yields more stable FID values across a larger range of $\omega$.
  • ...and 23 more figures