Table of Contents
Fetching ...

Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

Youngrok Park, Hojung Jung, Sangmin Bae, Se-Young Yun

TL;DR

This work addresses the off-manifold drift that occurs when diffusion models are externally guided. It introduces Temporal Alignment Guidance (TAG), combining the standard diffusion score with a Time-Linked Score (TLS) derived from a time predictor to keep samples aligned with the data manifold at every reverse step. The authors provide theoretical guarantees showing TAG reduces the total-variation distance to the target distribution and demonstrate its effectiveness across zero-shot and multi-conditional tasks, few-step generation, and large-scale text-to-image generation with reward and style objectives. TAG offers a training-free, inference-time correction that enhances fidelity under arbitrary guidance, enabling more reliable real-world diffusion-based generation.

Abstract

Diffusion models have achieved remarkable success as generative models. However, even a well-trained model can accumulate errors throughout the generation process. These errors become particularly problematic when arbitrary guidance is applied to steer samples toward desired properties, which often breaks sample fidelity. In this paper, we propose a general solution to address the off-manifold phenomenon observed in diffusion models. Our approach leverages a time predictor to estimate deviations from the desired data manifold at each timestep, identifying that a larger time gap is associated with reduced generation quality. We then design a novel guidance mechanism, `Temporal Alignment Guidance' (TAG), attracting the samples back to the desired manifold at every timestep during generation. Through extensive experiments, we demonstrate that TAG consistently produces samples closely aligned with the desired manifold at each timestep, leading to significant improvements in generation quality across various downstream tasks.

Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

TL;DR

This work addresses the off-manifold drift that occurs when diffusion models are externally guided. It introduces Temporal Alignment Guidance (TAG), combining the standard diffusion score with a Time-Linked Score (TLS) derived from a time predictor to keep samples aligned with the data manifold at every reverse step. The authors provide theoretical guarantees showing TAG reduces the total-variation distance to the target distribution and demonstrate its effectiveness across zero-shot and multi-conditional tasks, few-step generation, and large-scale text-to-image generation with reward and style objectives. TAG offers a training-free, inference-time correction that enhances fidelity under arbitrary guidance, enabling more reliable real-world diffusion-based generation.

Abstract

Diffusion models have achieved remarkable success as generative models. However, even a well-trained model can accumulate errors throughout the generation process. These errors become particularly problematic when arbitrary guidance is applied to steer samples toward desired properties, which often breaks sample fidelity. In this paper, we propose a general solution to address the off-manifold phenomenon observed in diffusion models. Our approach leverages a time predictor to estimate deviations from the desired data manifold at each timestep, identifying that a larger time gap is associated with reduced generation quality. We then design a novel guidance mechanism, `Temporal Alignment Guidance' (TAG), attracting the samples back to the desired manifold at every timestep during generation. Through extensive experiments, we demonstrate that TAG consistently produces samples closely aligned with the desired manifold at each timestep, leading to significant improvements in generation quality across various downstream tasks.

Paper Structure

This paper contains 131 sections, 14 theorems, 138 equations, 19 figures, 20 tables, 2 algorithms.

Key Result

Theorem 3.3

Assuming discrete diffusion timesteps $[t_1,t_2,\dots,t_n]$, Time-linked Score of a random noisy sample $\mathbf{x}$ to the target time $t_i$ can be represented as:

Figures (19)

  • Figure 1: Overview of TAG algorithm. (Left) Without TAG, external guidance pushes samples off-manifold, causing the standard diffusion step $\nabla_x \log p(x)$ to miss the target manifold $\mathcal{M}_{t_{i-1}}$. TAG's correction actively steers the sample back to the correct manifold $\mathcal{M}_{t_i}$, ensuring the diffusion step accurately reaches the desired manifold $\mathcal{M}_{t_{i-1}}$. (Right) Applying TAG can greatly improve the fidelity in conditional generation tasks with target conditions: worm for ImageNet, polarizability $\alpha$ for Molecule, female and black hair for CelebA.
  • Figure 1: Effect of TAG across strength $\omega$ of TAG when reverse process is corrupted with noise level $\sigma$.
  • Figure 2: Generated samples with score field. (Left) Generated outputs from reverse diffusion process with external drift, with vector field of the diffusion model output at $t=0$. (Right) Generated outputs when applying TAG with external drift, with vector field of the TLS at $t=0$.
  • Figure 3: FID values over different corruption levels for original diffusion process without TAG and with TAG.
  • Figure 4: Time gap in CIFAR10.
  • ...and 14 more figures

Theorems & Definitions (30)

  • Definition 3.1
  • Definition 3.2
  • Theorem 3.3
  • Proposition 3.4
  • Theorem 3.5
  • Proposition B.1
  • proof
  • Proposition B.2
  • proof
  • Definition B.3
  • ...and 20 more