Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective
Xiaoming Zhao, Alexander G. Schwing
TL;DR
This work reconsiders classifier-free guidance through a classifier-centric lens, revealing that both CFG and CFG shape denoising trajectories by steering them away from decision boundaries, with the effect amplified by larger guidance scales. It analyzes the foundational diffusion-process framework, questions the universality of the classifier-guidance decomposition, and demonstrates that a postprocessing flow-matching step can diagnose and improve high-dimensional conditional generation by focusing near decision boundaries. The empirical results on 2D fractal data, MNIST, and CIFAR-10 support the central perspective and show that flow-based postprocessing consistently enhances fidelity and alignment with conditioning across scales. The study provides a practical diagnostic tool and clarifies the relationship between classifier-based guidance and conditioning-dropout training, with implications for developing more reliable conditional diffusion systems.
Abstract
Classifier-free guidance has become a staple for conditional generation with denoising diffusion models. However, a comprehensive understanding of classifier-free guidance is still missing. In this work, we carry out an empirical study to provide a fresh perspective on classifier-free guidance. Concretely, instead of solely focusing on classifier-free guidance, we trace back to the root, i.e., classifier guidance, pinpoint the key assumption for the derivation, and conduct a systematic study to understand the role of the classifier. On 1D data, we find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries, i.e., areas where conditional information is usually entangled and is hard to learn. To validate this classifier-centric perspective on high-dimensional data, we assess whether a flow-matching postprocessing step that is designed to narrow the gap between a pre-trained diffusion model's learned distribution and the real data distribution, especially near decision boundaries, can improve the performance. Experiments on various datasets verify our classifier-centric understanding.
