Table of Contents
Fetching ...

Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale

Candi Zheng, Yuan Lan

TL;DR

This work addresses non-linear deviations that arise in diffusion model guidance at large scales by introducing characteristic guidance (CH), a training-free correction grounded in the method of characteristics to enforce FP-consistent dynamics. CH constructs a nonlinear denoising function from two base networks with a nonlinear input perturbation and a fixed-point correction, leveraging a harmonic ansatz to enable analytic relationships and zero mixing error in the infinitesimal-step limit. The authors validate CH theoretically on Gaussian toy models and empirically across magnet phase transitions, CIFAR-10, ImageNet-256, and Stable Diffusion, showing improved semantic control, reduced color/exposure artifacts, and better sampling diversity, often with little or no loss in standard quality metrics. The approach is compatible with a wide range of samplers and data types, suggesting a practical path to robust high-guidance diffusion that preserves detail and alignment with conditional prompts in real-world applications.

Abstract

Popular guidance for denoising diffusion probabilistic model (DDPM) linearly combines distinct conditional models together to provide enhanced control over samples. However, this approach overlooks nonlinear effects that become significant when guidance scale is large. To address this issue, we propose characteristic guidance, a guidance method that provides first-principle non-linear correction for classifier-free guidance. Such correction forces the guided DDPMs to respect the Fokker-Planck (FP) equation of diffusion process, in a way that is training-free and compatible with existing sampling methods. Experiments show that characteristic guidance enhances semantic characteristics of prompts and mitigate irregularities in image generation, proving effective in diverse applications ranging from simulating magnet phase transitions to latent space sampling.

Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale

TL;DR

This work addresses non-linear deviations that arise in diffusion model guidance at large scales by introducing characteristic guidance (CH), a training-free correction grounded in the method of characteristics to enforce FP-consistent dynamics. CH constructs a nonlinear denoising function from two base networks with a nonlinear input perturbation and a fixed-point correction, leveraging a harmonic ansatz to enable analytic relationships and zero mixing error in the infinitesimal-step limit. The authors validate CH theoretically on Gaussian toy models and empirically across magnet phase transitions, CIFAR-10, ImageNet-256, and Stable Diffusion, showing improved semantic control, reduced color/exposure artifacts, and better sampling diversity, often with little or no loss in standard quality metrics. The approach is compatible with a wide range of samplers and data types, suggesting a practical path to robust high-guidance diffusion that preserves detail and alignment with conditional prompts in real-world applications.

Abstract

Popular guidance for denoising diffusion probabilistic model (DDPM) linearly combines distinct conditional models together to provide enhanced control over samples. However, this approach overlooks nonlinear effects that become significant when guidance scale is large. To address this issue, we propose characteristic guidance, a guidance method that provides first-principle non-linear correction for classifier-free guidance. Such correction forces the guided DDPMs to respect the Fokker-Planck (FP) equation of diffusion process, in a way that is training-free and compatible with existing sampling methods. Experiments show that characteristic guidance enhances semantic characteristics of prompts and mitigate irregularities in image generation, proving effective in diverse applications ranging from simulating magnet phase transitions to latent space sampling.
Paper Structure (39 sections, 2 theorems, 71 equations, 20 figures, 1 table, 3 algorithms)

This paper contains 39 sections, 2 theorems, 71 equations, 20 figures, 1 table, 3 algorithms.

Key Result

Lemma 5.2

Let $\boldsymbol{\epsilon}(\mathbf{x}, t)$, $\boldsymbol{\epsilon}_1(\mathbf{x}, t)$, and $\boldsymbol{\epsilon}_2(\mathbf{x}, t)$ be three distinct solutions of the FP equation 1.7 OU Fokker Plank of score, satisfying the Harmonic Ansatz ansatz H. Moreover, their initial condition satisfies Then, we have the relation where $\Delta \mathbf{x}$ is given by in which $\sigma(t) = \sqrt{1-e^{-t}}$.

Figures (20)

  • Figure 1: Comparative visualization of images sampled from Stable diffusion XL podell2023sdxl between Classifier Free Guidance and Characteristic Guidance (Model name: animagineXL 3.0 animagine_xl_3_0, Seeds: 0,1,2,3). By addressing the non-linear effects of the FP equation, characteristic guidance demonstrates ability to mitigate irregularity in color, exposure and anatomy and enhancing prompt's semantic characteristics (e.g., "grass" depicted in the right-hand example).
  • Figure 2: Samples and the KL divergence from characteristic guidance (CH) and classifier free guidance (CF) guided DDPM modeling conditional Gaussian distribution. The contours corresponds to the theoretical reference (ground truth) distribution of the guided DDPM \ref{['Background guided diffusion p']}.
  • Figure 3: Comparison between CH (ours) and CF guided DDPM on modeling mixture of Gaussian distribution. The contours corresponds to the theoretical reference distribution of the guided DDPM \ref{['Background guided diffusion p']}. Samples from characteristic guidance shows better KL divergence than those from classifier-free guidance.
  • Figure 4: Comparison between CH (ours) and CF guided DDPM on simulating magnet cooling. The images in the upper and middle left (blue and red lattices) depict samples from DDPMs, while the histograms illustrate the distribution of samples' mean value (magnetization) across different temperatures. The contours corresponds to the theoretical reference distribution of magnetization. The characteristic guidance has better NLL and is more capable in capturing peak separation of sample magnetization.
  • Figure 5: Left: CIFAR-10 aircraft images generated via DDPM highlight the difference between Classifier-free Guidance (CF) and Characteristic Guidance (CH) across various guidance scales ($\omega$). The CF-guided images tend to have dull or even white backgrounds at higher $\omega$, whereas the CH-guided images creates more vibrant scenes with skies and clouds. Right: In ImageNet 256, volcano samples generated using latent diffusion models with CF and CH Guidance without cherry-picking. CF images show color cast and underexposure at higher $\omega$, while CH images maintain consistent color and exposure, better highlighting volcanic features, such as smoke and lava. For these visual comparisons, consistent initial noise ensures that both CF and CH guided images maintain similar contexts at lower $\omega$ values.
  • ...and 15 more figures

Theorems & Definitions (2)

  • Lemma 5.2
  • Theorem 5.3