Table of Contents
Fetching ...

C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

Jiayang Gao, Tianyi Zheng, Jiayang Zou, Fengxiang Yang, Shice Liu, Luyao Fan, Zheyu Zhang, Hao Zhang, Jinwei Chen, Peng-Tao Jiang, Bo Li, Jia Wang

TL;DR

Strict upper bounds are established on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process to explain the limitations of fixed-weight strategies and establish a principled foundation for time-dependent guidance.

Abstract

Classifier-Free Guidance (CFG) is a cornerstone of modern conditional diffusion models, yet its reliance on the fixed or heuristic dynamic guidance weight is predominantly empirical and overlooks the inherent dynamics of the diffusion process. In this paper, we provide a rigorous theoretical analysis of the Classifier-Free Guidance. Specifically, we establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce \textbf{Control Classifier-Free Guidance (C$^2$FG)}, a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function. Extensive experiments demonstrate that C$^2$FG is effective and broadly applicable across diverse generative tasks, while also exhibiting orthogonality to existing strategies.

C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

TL;DR

Strict upper bounds are established on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process to explain the limitations of fixed-weight strategies and establish a principled foundation for time-dependent guidance.

Abstract

Classifier-Free Guidance (CFG) is a cornerstone of modern conditional diffusion models, yet its reliance on the fixed or heuristic dynamic guidance weight is predominantly empirical and overlooks the inherent dynamics of the diffusion process. In this paper, we provide a rigorous theoretical analysis of the Classifier-Free Guidance. Specifically, we establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce \textbf{Control Classifier-Free Guidance (CFG)}, a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function. Extensive experiments demonstrate that CFG is effective and broadly applicable across diverse generative tasks, while also exhibiting orthogonality to existing strategies.
Paper Structure (31 sections, 16 theorems, 131 equations, 12 figures, 7 tables, 2 algorithms)

This paper contains 31 sections, 16 theorems, 131 equations, 12 figures, 7 tables, 2 algorithms.

Key Result

Theorem 1

Assume that the sample space is bounded and closed. Then we consider the VP-SDE let $p(x,t)$ and $\tilde{p}(x,t)$ denote the probability densities at time $t$, induced by initial distributions $p(x_0)$ and $\tilde{p}(x_0)$, respectively. Then, the mean-square error (MSE) between the scores satisfies the uniform bound where $C$ is a constant, $\alpha(t) = \exp{(-\frac{1}{2}\int_0^t\beta_s{\textno

Figures (12)

  • Figure 1: Following Song2020ScoreBasedGM, (a) and (b) present results for $t\geq t_0>0$. (a) shows that the MSE of conditional score and unconditional score can be bounded by a function which tends to 0 when $t\to+\infty$; (b) shows that the normalized cosine similarity between the two vectors decreases over reverse time, indicating that their directions gradually diverge in the reasoning process.
  • Figure 2: Noise to Image Process of C$^2$FG: Dynamic guidance weight $\omega(t)$ adaptively balances conditional and unconditional outputs at each timestep t during generation, guided by theoretical bounds on the score function. Moreover, we can choose to add the method of kynkaanniemi2024applying, where we fix the $\omega(t) =1$ at the beginning of generation or when $t$ tends to 0.
  • Figure 3: A two-dimensional distribution featuring two classes represented by gray and orange regions. Approximately 99% of the probability mass is inside the shown contours. (a) Ground truth samples from the orange class. (b) EDM2 ($\omega=1$) produces some outliers. (c) $\beta$-CFG ($\alpha=\beta=2, \omega=1$) produces more outliers. (d) C$^2$FG ($\omega_0=1, \lambda=0.6$) generates fewer outliers and better matches the target distribution.
  • Figure 4: Qualitative Comparison. Qualitative comparison on Class-Conditional ImageNet datasets with different architectures and samplers. The sampler used and the number of inference steps are indicated in parentheses.
  • Figure 5: Reverse Diffusion with CFG
  • ...and 7 more figures

Theorems & Definitions (33)

  • Theorem 1: VP-SDE Score MSE Bound
  • proof
  • Theorem 2: VE-SDE Score MSE Bound
  • proof
  • Theorem 3: Harnack-type Inequality of VP-SDE
  • proof
  • Theorem 4: Harnack-type Inequality of VE-SDE
  • proof
  • proof : Proof of Theorem \ref{['thm mse vp']}
  • proof : Proof of Theorem \ref{['thm mse ve']}
  • ...and 23 more