Table of Contents
Fetching ...

CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

Hanyang Wang, Yiyang Liu, Jiawei Chi, Fangfu Liu, Ran Xue, Yueqi Duan

TL;DR

This paper explores a unified framework called CFG-Ctrl, which reinterprets CFG as a control applied to the first-order continuous-time generative flow, using the conditional-unconditional discrepancy as an error signal to adjust the velocity field.

Abstract

Classifier-Free Guidance (CFG) has emerged as a central approach for enhancing semantic alignment in flow-based diffusion models. In this paper, we explore a unified framework called CFG-Ctrl, which reinterprets CFG as a control applied to the first-order continuous-time generative flow, using the conditional-unconditional discrepancy as an error signal to adjust the velocity field. From this perspective, we summarize vanilla CFG as a proportional controller (P-control) with fixed gain, and typical follow-up variants develop extended control-law designs derived from it. However, existing methods mainly rely on linear control, inherently leading to instability, overshooting, and degraded semantic fidelity especially on large guidance scales. To address this, we introduce Sliding Mode Control CFG (SMC-CFG), which enforces the generative flow toward a rapidly convergent sliding manifold. Specifically, we define an exponential sliding mode surface over the semantic prediction error and introduce a switching control term to establish nonlinear feedback-guided correction. Moreover, we provide a Lyapunov stability analysis to theoretically support finite-time convergence. Experiments across text-to-image generation models including Stable Diffusion 3.5, Flux, and Qwen-Image demonstrate that SMC-CFG outperforms standard CFG in semantic alignment and enhances robustness across a wide range of guidance scales. Project Page: https://hanyang-21.github.io/CFG-Ctrl

CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

TL;DR

This paper explores a unified framework called CFG-Ctrl, which reinterprets CFG as a control applied to the first-order continuous-time generative flow, using the conditional-unconditional discrepancy as an error signal to adjust the velocity field.

Abstract

Classifier-Free Guidance (CFG) has emerged as a central approach for enhancing semantic alignment in flow-based diffusion models. In this paper, we explore a unified framework called CFG-Ctrl, which reinterprets CFG as a control applied to the first-order continuous-time generative flow, using the conditional-unconditional discrepancy as an error signal to adjust the velocity field. From this perspective, we summarize vanilla CFG as a proportional controller (P-control) with fixed gain, and typical follow-up variants develop extended control-law designs derived from it. However, existing methods mainly rely on linear control, inherently leading to instability, overshooting, and degraded semantic fidelity especially on large guidance scales. To address this, we introduce Sliding Mode Control CFG (SMC-CFG), which enforces the generative flow toward a rapidly convergent sliding manifold. Specifically, we define an exponential sliding mode surface over the semantic prediction error and introduce a switching control term to establish nonlinear feedback-guided correction. Moreover, we provide a Lyapunov stability analysis to theoretically support finite-time convergence. Experiments across text-to-image generation models including Stable Diffusion 3.5, Flux, and Qwen-Image demonstrate that SMC-CFG outperforms standard CFG in semantic alignment and enhances robustness across a wide range of guidance scales. Project Page: https://hanyang-21.github.io/CFG-Ctrl
Paper Structure (32 sections, 1 theorem, 47 equations, 10 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 1 theorem, 47 equations, 10 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Consider the system in Eq. eq:s_dynamics_formal under Assumptions assump:boundedness and assump:dominance. If the switching gain $k$ satisfies: where $\epsilon > 0$ is a safety margin, then the sliding variable $\mathbf{s}(t)$ converges to zero in finite time.

Figures (10)

  • Figure 1: Phase diagram in the $\mathbf{e}$-$\dot{\mathbf{e}}$ plane. We schematically illustrate the convergence patterns of CFG and the proposed SMC-CFG. Left: CFG's ideal linear convergence trajectory and the strong oscillatory divergence under high guidance scales. Right: the proposed SMC-CFG, through a switching-forcing mechanism, drives the system states toward the sliding mode surface governed by parameter $\lambda$, achieving robust and rapid convergence.
  • Figure 2: Qualitative results across different T2I models. We provide visual comparisons between CFG and our SMC-CFG across various models. SMC-CFG exhibits better performance in positional relationships, text generation, and detailed object representation.
  • Figure 3: Qualitative comparison with baseline methods. For challenging scenarios including relative positions, clothing styles, and human actions, baseline methods produce irrational outputs, while SMC-CFG preserves robust text consistency.
  • Figure 4: Visual comparison between CFG (top) and SMC-CFG (bottom) across different CFG scales.
  • Figure 5: Qualitative results under various hyperparameters.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Theorem 1: Robust Convergence
  • proof