Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms
Krunoslav Lehman Pavasovic, Jakob Verbeek, Giulio Biroli, Marc Mezard
TL;DR
Classifier-Free Guidance (CFG) is analyzed in diffusion and flow-based generative models to understand how conditioning shapes the generated distribution. The authors prove a blessing-of-dimensionality result: in high or infinite dimensions, CFG distortions vanish and the guided samples converge to the target distribution, while introducing a broad family of generalized guidance forms, notably non-linear power-law CFG. They show these generalized guidances retain correct sampling in high dimensions and demonstrate improved fidelity, robustness, and diversity in experiments on class-conditional and text-to-image tasks. The work bridges theory and practice by connecting high-dimensional dynamics to practical guidance strategies that enhance modern diffusion and flow-matching models.
Abstract
Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distribution sharper than the target one, more shifted towards the boundary of the class. In this work, we provide a high-dimensional analysis of CFG, showing that these distortions vanish as the data dimension grows. We present a blessing-of-dimensionality result demonstrating that in sufficiently high and infinite dimensions, CFG accurately reproduces the target distribution. Using our high-dimensional theory, we show that there is a large family of guidances enjoying this property, in particular non-linear CFG generalizations. We study a simple non-linear power-law version, for which we demonstrate improved robustness, sample fidelity and diversity. Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.
