Table of Contents
Fetching ...

Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective

Xiaoming Zhao, Alexander G. Schwing

TL;DR

This work reconsiders classifier-free guidance through a classifier-centric lens, revealing that both CFG and CFG shape denoising trajectories by steering them away from decision boundaries, with the effect amplified by larger guidance scales. It analyzes the foundational diffusion-process framework, questions the universality of the classifier-guidance decomposition, and demonstrates that a postprocessing flow-matching step can diagnose and improve high-dimensional conditional generation by focusing near decision boundaries. The empirical results on 2D fractal data, MNIST, and CIFAR-10 support the central perspective and show that flow-based postprocessing consistently enhances fidelity and alignment with conditioning across scales. The study provides a practical diagnostic tool and clarifies the relationship between classifier-based guidance and conditioning-dropout training, with implications for developing more reliable conditional diffusion systems.

Abstract

Classifier-free guidance has become a staple for conditional generation with denoising diffusion models. However, a comprehensive understanding of classifier-free guidance is still missing. In this work, we carry out an empirical study to provide a fresh perspective on classifier-free guidance. Concretely, instead of solely focusing on classifier-free guidance, we trace back to the root, i.e., classifier guidance, pinpoint the key assumption for the derivation, and conduct a systematic study to understand the role of the classifier. On 1D data, we find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries, i.e., areas where conditional information is usually entangled and is hard to learn. To validate this classifier-centric perspective on high-dimensional data, we assess whether a flow-matching postprocessing step that is designed to narrow the gap between a pre-trained diffusion model's learned distribution and the real data distribution, especially near decision boundaries, can improve the performance. Experiments on various datasets verify our classifier-centric understanding.

Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective

TL;DR

This work reconsiders classifier-free guidance through a classifier-centric lens, revealing that both CFG and CFG shape denoising trajectories by steering them away from decision boundaries, with the effect amplified by larger guidance scales. It analyzes the foundational diffusion-process framework, questions the universality of the classifier-guidance decomposition, and demonstrates that a postprocessing flow-matching step can diagnose and improve high-dimensional conditional generation by focusing near decision boundaries. The empirical results on 2D fractal data, MNIST, and CIFAR-10 support the central perspective and show that flow-based postprocessing consistently enhances fidelity and alignment with conditioning across scales. The study provides a practical diagnostic tool and clarifies the relationship between classifier-based guidance and conditioning-dropout training, with implications for developing more reliable conditional diffusion systems.

Abstract

Classifier-free guidance has become a staple for conditional generation with denoising diffusion models. However, a comprehensive understanding of classifier-free guidance is still missing. In this work, we carry out an empirical study to provide a fresh perspective on classifier-free guidance. Concretely, instead of solely focusing on classifier-free guidance, we trace back to the root, i.e., classifier guidance, pinpoint the key assumption for the derivation, and conduct a systematic study to understand the role of the classifier. On 1D data, we find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries, i.e., areas where conditional information is usually entangled and is hard to learn. To validate this classifier-centric perspective on high-dimensional data, we assess whether a flow-matching postprocessing step that is designed to narrow the gap between a pre-trained diffusion model's learned distribution and the real data distribution, especially near decision boundaries, can improve the performance. Experiments on various datasets verify our classifier-centric understanding.

Paper Structure

This paper contains 31 sections, 16 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Classifier guidance decomposition (Eq. \ref{['eq: guided_diffusion']}) does not always hold. We apply classifier guidance on 1D data from $\mathcal{N}(\pm 1.0, 0.05)$, $\mathcal{N}(\pm 0.5, 0.05)$, and $\mathcal{N}(\pm 0.1, 0.05)$ respectively. The denoising diffusion process starts from left to right. For each dataset, we train a vanilla conditional diffusion model and a decomposed version, i.e., an unconditional diffusion model and a classifier. We generate 20k samples (10k for each class) from both sides of Eq. \ref{['eq: guided_diffusion']} with the same initial noises and compute the absolute differences for each step in the denoising diffusion process. This plot shows the average as well as the standard deviation for the difference. Apparently, the classifier guidance decomposition doesn't hold with equality.
  • Figure 2: Classifier guidance behavior is dominated by the classifier. We apply denoising diffusion models with classifier guidance on a 1D dataset with data from $\mathcal{N}(\pm 1.0, 0.05)$. The classifiers in Fig. \ref{['fig: gaussian traj cg classifier nonlinear']} and \ref{['fig: gaussian traj cg classifier linear']} differ. The denoising diffusion process for all plots starts from the bottom to the top. In Fig. \ref{['fig: gaussian traj cg classifier nonlinear']} and \ref{['fig: gaussian traj cg classifier linear']}, the first plot demonstrates the classifier's accuracy on a validation set for each class through the diffusion process, i.e., $p_\theta ( c \vert \mathbf{x}_{t})$ in Eq. \ref{['eq: guided_diffusion']}, while the remaining three plots display the diffusion trajectories with different guidance scales. We observe: 1) classifier guidance essentially pushes the diffusion process away from the classifier's decision boundary that is around the origin; and 2) different classifiers can produce entirely different trajectories (Fig. \ref{['fig: gaussian traj cg classifier nonlinear']}vs. \ref{['fig: gaussian traj cg classifier linear']}). Since we use the same initial noise and the same unconditional diffusion model, i.e., $p_\theta (\mathbf{x}_{t} \vert \mathbf{x}_{t+1})$ in Eq. \ref{['eq: guided_diffusion']}, for all plots, differences are solely due to the classifier.
  • Figure 2: Postprocessing for classifier-free guidance on CIFAR-10. We report conditional FID on 50k generations with seeds from 0 to 49999. Lower FID is better, and the best in each row is highlighted. Postprocessing is abbreviated as "Post.". See qualitative results in Fig. \ref{['fig:cifar10 edm simplified']}.
  • Figure 3: Classifier-free guidance distorts denoising diffusion trajectories. We apply denoising diffusion models with classifier-free guidance on a 1D dataset composed of data from $\mathcal{N}(\pm 1.0, 0.05)$. The denoising diffusion process for all plots starts from the bottom to the top. We use the same trained model as well as the same initial noise for all plots. The trajectory differences are solely caused by different guidance scales. Different scales in Fig. \ref{['fig: gaussian traj cg']} and this figure arise from classifier guidance and classifier-free guidance's differing sensitivities. Here, scale=2.5 distorts trajectories significantly, while Fig. \ref{['fig: gaussian traj cg']}’s scale=4 causes minor changes. We hypothesize that classifier-free guidance's greater sensitivity stems from its training with conditioning dropout.
  • Figure 4: Classifier guidance with flow-matching based postprocessing (Sec. \ref{['sec: postprocess']}) on 2D fractal data. After training, all three classifiers' decision boundaries roughly align with the diagonal from top-left to bottom-right. See Sec. \ref{['sec: fractal exp setup']} for the experiment setup. In Fig. \ref{['fig: fractal cg 0.0']} to \ref{['fig: fractal cg 1.0']}, the $3^\text{rd}$ (and $5^\text{th}$) plot show generated samples after applying postprocessing on generations from the $2^\text{nd}$ (and $4^\text{th}$) plot. For a clear visualization, we only display generations for one class (see Fig. \ref{['fig: fractal cg class 0']} for the other class).
  • ...and 11 more figures