Table of Contents
Fetching ...

TCFG: Tangential Damping Classifier-free Guidance

Mingi Kwon, Shin seong Kim, Jaeseok Jeong. Yi Ting Hsiao, Youngjung Uh

TL;DR

This work addresses CFG misalignment between unconditional and conditional scores in diffusion-based text-to-image synthesis. It introduces Tangential Damping Classifier-free Guidance (TCFG), which applies SVD to the score pair matrix to remove tangential, misaligned components and project the unconditional score onto the dominant normal direction of the conditional manifold, yielding a refined guidance signal. The approach achieves consistent FID improvements across multiple diffusion models (SD v1.5, SDXL, SD v3) and DiT on ImageNet, with negligible computational overhead and stable CLIP scores. By revealing and exploiting the manifold/tangent-space structure of diffusion scores, TC-FG enhances conditional image quality, reduces overexposure biases, and remains compatible with existing CFG improvements and high-resolution generation scenarios.

Abstract

Diffusion models have achieved remarkable success in text-to-image synthesis, largely attributed to the use of classifier-free guidance (CFG), which enables high-quality, condition-aligned image generation. CFG combines the conditional score (e.g., text-conditioned) with the unconditional score to control the output. However, the unconditional score is in charge of estimating the transition between manifolds of adjacent timesteps from $x_t$ to $x_{t-1}$, which may inadvertently interfere with the trajectory toward the specific condition. In this work, we introduce a novel approach that leverages a geometric perspective on the unconditional score to enhance CFG performance when conditional scores are available. Specifically, we propose a method that filters the singular vectors of both conditional and unconditional scores using singular value decomposition. This filtering process aligns the unconditional score with the conditional score, thereby refining the sampling trajectory to stay closer to the manifold. Our approach improves image quality with negligible additional computation. We provide deeper insights into the score function behavior in diffusion models and present a practical technique for achieving more accurate and contextually coherent image synthesis.

TCFG: Tangential Damping Classifier-free Guidance

TL;DR

This work addresses CFG misalignment between unconditional and conditional scores in diffusion-based text-to-image synthesis. It introduces Tangential Damping Classifier-free Guidance (TCFG), which applies SVD to the score pair matrix to remove tangential, misaligned components and project the unconditional score onto the dominant normal direction of the conditional manifold, yielding a refined guidance signal. The approach achieves consistent FID improvements across multiple diffusion models (SD v1.5, SDXL, SD v3) and DiT on ImageNet, with negligible computational overhead and stable CLIP scores. By revealing and exploiting the manifold/tangent-space structure of diffusion scores, TC-FG enhances conditional image quality, reduces overexposure biases, and remains compatible with existing CFG improvements and high-resolution generation scenarios.

Abstract

Diffusion models have achieved remarkable success in text-to-image synthesis, largely attributed to the use of classifier-free guidance (CFG), which enables high-quality, condition-aligned image generation. CFG combines the conditional score (e.g., text-conditioned) with the unconditional score to control the output. However, the unconditional score is in charge of estimating the transition between manifolds of adjacent timesteps from to , which may inadvertently interfere with the trajectory toward the specific condition. In this work, we introduce a novel approach that leverages a geometric perspective on the unconditional score to enhance CFG performance when conditional scores are available. Specifically, we propose a method that filters the singular vectors of both conditional and unconditional scores using singular value decomposition. This filtering process aligns the unconditional score with the conditional score, thereby refining the sampling trajectory to stay closer to the manifold. Our approach improves image quality with negligible additional computation. We provide deeper insights into the score function behavior in diffusion models and present a practical technique for achieving more accurate and contextually coherent image synthesis.

Paper Structure

This paper contains 25 sections, 11 equations, 14 figures, 6 tables, 1 algorithm.

Figures (14)

  • Figure 1: (a) Classifier-free guidance. When the unconditional score ${\bm{s}}_\theta({\bm{z}}_t)$ and the conditional score ${\bm{s}}_\theta({\bm{z}}_t,y)$ are misaligned, the result of CFG tends to fall off the manifold. (b) Our proposed method reduces the misalignment between the unconditional score ${\bm{s}}_\theta({\bm{z}}_t)$ and the conditional score ${\bm{s}}_\theta({\bm{z}}_t,y)$, ensuring sampling aligns with the target manifold.
  • Figure 2: Singular values of the score function across all timesteps. We computed the singular values for all timesteps using a total of 17,000 samples from Stable Diffusion v1.5. For both the unconditional and the conditional scores, a significant drop in singular values was observed at indices close to 0 across all timesteps. This suggests the existence of an intermediate manifold.
  • Figure 3: Cosine similarity between singular vectors of unconditional and conditional scores. We computed the singular vectors $V$ at each timestep using a total of 17,000 samples from Stable Diffusion v1.5. We observe the similarity of significant singular vectors (i.e., those with indices close to 0) between unconditional and conditional scores are mostly high across all timesteps $T$.
  • Figure 4: Sampling results on different methods with diffusion model trained on two moons dataset. Our proposed methods (c, d) demonstrate a closer match to the target distribution compared to using conditional scores only or CFG. In (c), SVD is computed across all samples, while in (d), SVD is calculated separately for each pair of conditional and unconditional scores.
  • Figure 5: Visualization of the sampling trajectory. In CFG (orange path), the unconditional scores (red arrows) include components that point towards directions other than the target distribution, making the final destination deviate from the target distribution. Whereas, our method (green path) removes the inconsistent tangent components in unconditional scores and eventually reaches the target distribution.
  • ...and 9 more figures