Table of Contents
Fetching ...

Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents

Beomsu Kim, Byunghee Cha, Jong Chul Ye

TL;DR

This work identifies oscillatory, manifold-parallel CM update directions as a key bottleneck in training consistency models. It introduces a self-supervised manifold feature distance (MFD) via learned manifold features to align tangents toward the data manifold, embodied in the Align Your Tangent (AYT) method. Empirically, AYT accelerates convergence by orders of magnitude, surpasses LPIPS in several benchmarks, and remains robust to very small batch sizes on CIFAR10 and ImageNet 64×64. By tying optimization geometry to the data manifold, the approach offers a practical, scalable path to faster, high-quality few-step generative modeling across domains.

Abstract

With diffusion and flow matching models achieving state-of-the-art generating performance, the interest of the community now turned to reducing the inference time without sacrificing sample quality. Consistency Models (CMs), which are trained to be consistent on diffusion or probability flow ordinary differential equation (PF-ODE) trajectories, enable one or two-step flow or diffusion sampling. However, CMs typically require prolonged training with large batch sizes to obtain competitive sample quality. In this paper, we examine the training dynamics of CMs near convergence and discover that CM tangents -- CM output update directions -- are quite oscillatory, in the sense that they move parallel to the data manifold, not towards the manifold. To mitigate oscillatory tangents, we propose a new loss function, called the manifold feature distance (MFD), which provides manifold-aligned tangents that point toward the data manifold. Consequently, our method -- dubbed Align Your Tangent (AYT) -- can accelerate CM training by orders of magnitude and even out-perform the learned perceptual image patch similarity metric (LPIPS). Furthermore, we find that our loss enables training with extremely small batch sizes without compromising sample quality. Code: https://github.com/1202kbs/AYT

Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents

TL;DR

This work identifies oscillatory, manifold-parallel CM update directions as a key bottleneck in training consistency models. It introduces a self-supervised manifold feature distance (MFD) via learned manifold features to align tangents toward the data manifold, embodied in the Align Your Tangent (AYT) method. Empirically, AYT accelerates convergence by orders of magnitude, surpasses LPIPS in several benchmarks, and remains robust to very small batch sizes on CIFAR10 and ImageNet 64×64. By tying optimization geometry to the data manifold, the approach offers a practical, scalable path to faster, high-quality few-step generative modeling across domains.

Abstract

With diffusion and flow matching models achieving state-of-the-art generating performance, the interest of the community now turned to reducing the inference time without sacrificing sample quality. Consistency Models (CMs), which are trained to be consistent on diffusion or probability flow ordinary differential equation (PF-ODE) trajectories, enable one or two-step flow or diffusion sampling. However, CMs typically require prolonged training with large batch sizes to obtain competitive sample quality. In this paper, we examine the training dynamics of CMs near convergence and discover that CM tangents -- CM output update directions -- are quite oscillatory, in the sense that they move parallel to the data manifold, not towards the manifold. To mitigate oscillatory tangents, we propose a new loss function, called the manifold feature distance (MFD), which provides manifold-aligned tangents that point toward the data manifold. Consequently, our method -- dubbed Align Your Tangent (AYT) -- can accelerate CM training by orders of magnitude and even out-perform the learned perceptual image patch similarity metric (LPIPS). Furthermore, we find that our loss enables training with extremely small batch sizes without compromising sample quality. Code: https://github.com/1202kbs/AYT

Paper Structure

This paper contains 19 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Left: CM tangents, i.e., CM output update directions, exhibit large oscillations throughout training. To mitigate this, we learn feature maps $\phi$ whose level sets $\phi^{-1}(\alpha)$ model increasingly perturbed data manifolds, so feature map gradients point towards the manifold. CM tangents in the feature space are expressed as linear combinations of feature map gradients, so we obtain manifold-aligned tangents. Right:Manifold-aligned tangents (AYT) enable up to $\times 10$ faster convergence and competitive FIDs with $\times 1/8$ batch size (bs). We use Easy Consistency Training (ECT) geng2025ect. Shaded regions indicate min/max FIDs over three sample generation trials.
  • Figure 2: CM tangent visualization on CIFAR10 after training to near-convergence ($400k$ iterations). First row: inputs ${\bm{x}}_t = {\bm{x}}_0 + t {\bm{\epsilon}}$. Second row: outputs ${\bm{f}}_{\bm{\theta}}({\bm{x}}_t,t)$. Third row: vanilla CM tangents computed with Eq. (\ref{['eq:disc_tangent']}). Tangents are averaged along the channel dimension for visualization, and red and blue pixels indicate positive and negative values, resp. Fourth row: manifold-aligned tangents (AYT) computed with Eq. (\ref{['eq:feature_tangent']}).
  • Figure 3: Tangent analysis on 2D discs after training to near-convergence ($200k$ iterations) for vanilla CM and align your tangent (AYT). In each figure, we visualize CM inputs, CM outputs, CM tangents, manifold-parallel component of tangents, and manifold-orthogonal component of tangents.
  • Figure 4: Amount of manifold-orthogonal components in tangents for vanilla CM and our manifold aligned tangents (AYT) throughout training.
  • Figure 5: Ablation studies on CIFAR10. For transformation ablation and AYT vs. LPIPS, we use batch size $64$. Shaded regions indicate min/max FIDs over three generation trials.
  • ...and 3 more figures