Table of Contents
Fetching ...

TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling

Hyunmin Cho, Donghoon Ahn, Susung Hong, Jee Eun Kim, Seungryong Kim, Kyong Hwan Jin

TL;DR

This work tackles hallucinations in diffusion-based image synthesis by reframing inference-time guidance as a geometry-aware trajectory refinement. It introduces Tangential Amplifying Guidance (TAG), which decomposes the base sampling update into normal and tangential components relative to the current latent state and amplifies the tangential part by a factor $\eta \ge 1$, while preserving the radial (noise-schedule) term. Grounded in Tweedie’s identity and a first-order Taylor analysis, TAG provably increases the local log-likelihood gain and steers samples toward higher-density regions of the data manifold without retraining. Empirically, TAG improves FID, IS, and CLIP-based metrics across unconditional and conditional generation, across multiple backbones (including SD v1.5, v2.1, XL, and SD3) and even flow-matching, while reducing compute via fewer NFEs. The method is plug-and-play and architecture-agnostic, offering a practical, low-overhead path to more faithful, hallucination-resistant diffusion sampling.

Abstract

Recent diffusion models achieve the state-of-the-art performance in image generation, but often suffer from semantic inconsistencies or hallucinations. While various inference-time guidance methods can enhance generation, they often operate indirectly by relying on external signals or architectural modifications, which introduces additional computational overhead. In this paper, we propose Tangential Amplifying Guidance (TAG), a more efficient and direct guidance method that operates solely on trajectory signals without modifying the underlying diffusion model. TAG leverages an intermediate sample as a projection basis and amplifies the tangential components of the estimated scores with respect to this basis to correct the sampling trajectory. We formalize this guidance process by leveraging a first-order Taylor expansion, which demonstrates that amplifying the tangential component steers the state toward higher-probability regions, thereby reducing inconsistencies and enhancing sample quality. TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition, offering a new perspective on diffusion guidance.

TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling

TL;DR

This work tackles hallucinations in diffusion-based image synthesis by reframing inference-time guidance as a geometry-aware trajectory refinement. It introduces Tangential Amplifying Guidance (TAG), which decomposes the base sampling update into normal and tangential components relative to the current latent state and amplifies the tangential part by a factor , while preserving the radial (noise-schedule) term. Grounded in Tweedie’s identity and a first-order Taylor analysis, TAG provably increases the local log-likelihood gain and steers samples toward higher-density regions of the data manifold without retraining. Empirically, TAG improves FID, IS, and CLIP-based metrics across unconditional and conditional generation, across multiple backbones (including SD v1.5, v2.1, XL, and SD3) and even flow-matching, while reducing compute via fewer NFEs. The method is plug-and-play and architecture-agnostic, offering a practical, low-overhead path to more faithful, hallucination-resistant diffusion sampling.

Abstract

Recent diffusion models achieve the state-of-the-art performance in image generation, but often suffer from semantic inconsistencies or hallucinations. While various inference-time guidance methods can enhance generation, they often operate indirectly by relying on external signals or architectural modifications, which introduces additional computational overhead. In this paper, we propose Tangential Amplifying Guidance (TAG), a more efficient and direct guidance method that operates solely on trajectory signals without modifying the underlying diffusion model. TAG leverages an intermediate sample as a projection basis and amplifies the tangential components of the estimated scores with respect to this basis to correct the sampling trajectory. We formalize this guidance process by leveraging a first-order Taylor expansion, which demonstrates that amplifying the tangential component steers the state toward higher-probability regions, thereby reducing inconsistencies and enhancing sample quality. TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition, offering a new perspective on diffusion guidance.

Paper Structure

This paper contains 19 sections, 1 theorem, 40 equations, 12 figures, 5 tables, 4 algorithms.

Key Result

Theorem 4.1

Assume a deterministic base step with $\Delta_{k+1} = \tilde{\alpha}_k \epsilon_\theta({\bm{x}}_{k+1}, t_{k+1}) + \beta_k {\bm{x}}_{k+1}$ and $\tilde{\alpha}_k \le 0$. Let ${\bm{P}}_{k+1}\succeq0$ and ${\bm{P}}^\perp_{k+1}\succeq0$ be the projectors defined above. For the TAG step $\Delta_{k+1}^{\ma and, in particular, Equality holds iff $\eta=1$. The proof is provided in Appendix app:proof-monot

Figures (12)

  • Figure 1: Conceptual visualization of Tangential Amplifying Guidance (TAG) from a mode-interpolation perspective aithal2024understandinghallucinationsdiffusionmodels. Unlike (a) no guidance case, (b) TAG decomposes the base increment $\Delta_{k+1}$ on the latent sphere into parallel ${\bm{P}}_{k+1}\Delta_{k+1}$ and orthogonal (i.e., tangential) ${\bm{P}}_{k+1}^\perp\Delta_{k+1}$ components (equation \ref{['eq:tag_update_rule']}). By preserving the parallel component while adding a scaled tangential component, TAG isolates the data-relevant part of the update (§\ref{['sec:revisiting']}) and can more effectively navigate the data manifold, leading to samples that contain more semantic structure. We make this precise by proving that amplifying the tangential has the effect of guiding the trajectories toward regions of higher model density while mitigating off-manifold drift (§\ref{['sec:theory']}, equation \ref{['eq:taylor-1st-gain']}).
  • Figure 2: Amplifying the tangential component enhances semantic content by isolating it from noise. This figure illustrates the decomposition of the update step $\Delta_k$ into normal and tangential components. Subtracting the unstructured, noisy normal component ${{\bm{P}}}_k \Delta_k$ from the original update acts as a denoising operation, revealing the tangential component ${{\bm{P}}}_k^{\perp}\Delta_k$, which preserves the principal semantic structure. Images decoded from intermediate timesteps ($t{=}981,501$) indicate that semantic information is most salient in the tangential component. Motivated by this observation, our method $\boldsymbol{\Delta_k^{\rm TAG}}$ amplifies this semantically rich component, yielding a clearer and more coherent final sample (far right) than that obtained from the unmodified $\Delta_k$ (Please zoom-in for details).
  • Figure 3: Sampling on a 2D branching distribution karras2024guiding under different guidance methods. (a) No guidance: probability mass drifts off the data manifold, yielding fragmented branches and OOD (Out of Distribution) points. (b) Naive truncation: suppresses some OOD but oversimplifies the geometry, dropping fine branches. (c) CFG: reduces boundary violations but also reduces diversity and can still leave OOD strays in our run. (d) TAG (Ours): trajectories are steered toward high-density regions along the branches, suppressing off-manifold outliers while retaining detail. (e) Ground truth. Overall, TAG achieves the highest similarity to the GT distribution without additional #NFEs, concentrating mass on the correct branches while substantially reducing residual OOD outliers.
  • Figure 4: Effectiveness of TAG. At 50 NFEs, TAG surpasses the sample quality at 250 NFEs from baseline. In contrast, +Normal causes severe over-smoothing.
  • Figure 6: Qualitative comparison of TAG across unconditional and conditional generation settings. The left four columns demonstrate that for unconditional generation, TAG enhances the detail and coherence of samples from the SD3 podell2024sdxl. The right four columns show that for conditional generation, TAG can be applied on top of existing guidance methods (e.g., PAG ahn2024self, SEG hong2024smoothed) to further improve their outputs.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 4.1: Monotonicity of the First-order Taylor Gain
  • proof