Table of Contents
Fetching ...

SCoT: Unifying Consistency Models and Rectified Flows via Straight-Consistent Trajectories

Zhangkai Wu, Xuhui Fan, Hongyu Wu, Longbing Cao

TL;DR

SCoT introduces a unified trajectory distillation framework that simultaneously enforces straightness and consistency in diffusion-model trajectories, bridging consistency-model distillation and rectified-flow distillation without relying on heavy ODE solvers. It learns a trajectory projection $G_oldsymbol{\,φ}(x_t,t,s)$ regulated by a velocity loss and a soft-consistency loss, enabling high-quality image generation with only a few function evaluations. Empirical results on CIFAR-10 and ImageNet show competitive FID, Recall, and IS with low NFEs, and ablations confirm the importance of both trajectory straightening and consistency guarantees. The work promises practical impact for fast, high-fidelity sampling in resource-constrained settings and provides a foundation for extending to high-resolution and conditional generation tasks.

Abstract

Pre-trained diffusion models are commonly used to generate clean data (e.g., images) from random noises, effectively forming pairs of noises and corresponding clean images. Distillation on these pre-trained models can be viewed as the process of constructing advanced trajectories within the pair to accelerate sampling. For instance, consistency model distillation develops consistent projection functions to regulate trajectories, although sampling efficiency remains a concern. Rectified flow method enforces straight trajectories to enable faster sampling, yet relies on numerical ODE solvers, which may introduce approximation errors. In this work, we bridge the gap between the consistency model and the rectified flow method by proposing a Straight Consistent Trajectory~(SCoT) model. SCoT enjoys the benefits of both approaches for fast sampling, producing trajectories with consistent and straight properties simultaneously. These dual properties are strategically balanced by targeting two critical objectives: (1) regulating the gradient of SCoT's mapping to a constant, (2) ensuring trajectory consistency. Extensive experimental results demonstrate the effectiveness and efficiency of SCoT.

SCoT: Unifying Consistency Models and Rectified Flows via Straight-Consistent Trajectories

TL;DR

SCoT introduces a unified trajectory distillation framework that simultaneously enforces straightness and consistency in diffusion-model trajectories, bridging consistency-model distillation and rectified-flow distillation without relying on heavy ODE solvers. It learns a trajectory projection regulated by a velocity loss and a soft-consistency loss, enabling high-quality image generation with only a few function evaluations. Empirical results on CIFAR-10 and ImageNet show competitive FID, Recall, and IS with low NFEs, and ablations confirm the importance of both trajectory straightening and consistency guarantees. The work promises practical impact for fast, high-fidelity sampling in resource-constrained settings and provides a foundation for extending to high-resolution and conditional generation tasks.

Abstract

Pre-trained diffusion models are commonly used to generate clean data (e.g., images) from random noises, effectively forming pairs of noises and corresponding clean images. Distillation on these pre-trained models can be viewed as the process of constructing advanced trajectories within the pair to accelerate sampling. For instance, consistency model distillation develops consistent projection functions to regulate trajectories, although sampling efficiency remains a concern. Rectified flow method enforces straight trajectories to enable faster sampling, yet relies on numerical ODE solvers, which may introduce approximation errors. In this work, we bridge the gap between the consistency model and the rectified flow method by proposing a Straight Consistent Trajectory~(SCoT) model. SCoT enjoys the benefits of both approaches for fast sampling, producing trajectories with consistent and straight properties simultaneously. These dual properties are strategically balanced by targeting two critical objectives: (1) regulating the gradient of SCoT's mapping to a constant, (2) ensuring trajectory consistency. Extensive experimental results demonstrate the effectiveness and efficiency of SCoT.

Paper Structure

This paper contains 32 sections, 11 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of trajectory distillation methods. The black line in each panel denotes the teacher trajectory of pre-trained diffusion models, which are connected within the pair of a random noise (left dot) and a clean image (right dot). The red solid line is the student trajectory of the distillation model. Panel (a) Reflow liu2023flow straightens its student trajectory by enforcing its velocity be close to a constant. However, its trajectory maps different points to different values due to the lack of consistency. Panel (b) CTM kim2023consistency places consistency requirement for the student trajectory. However, it might be difficult to track the student trajectory when it is of high curvatures. In panel (c), the Shortcut model frans2025one focuses on velocity estimation, while uses straight lines to approximate the trajectory and ensure the consistency. Our proposed SCoT model in panel (d) enforces straightness for consistent student trajectory. By avoiding the approximating errors of solving ODEs and by straightening the student trajectory, SCoT successfully bridges the gap between rectified flows and CTM distillations and enjoys the benefits of both approaches.
  • Figure 2: From two different points ${\boldsymbol{\mathbf{x}}}_{t_1}, {\boldsymbol{\mathbf{x}}}_{t_2}$, SCoT maps to the same point $\widehat{{\boldsymbol{\mathbf{x}}}}_s$. The velocity ${\boldsymbol{\mathbf{\mu}}}_{{\boldsymbol{\mathbf{\phi}}}}(\widehat{{\boldsymbol{\mathbf{x}}}}_s, s)$ at time step $s$ is independent from previous time steps $t_1, t_2$.
  • Figure 3: 1-Step generation on the initial training stage for ImageNet by SCoT \ref{['alg:sampling-algorithm']} sampler.
  • Figure 4: 1-Step generation on the 10k training stage for ImageNet by SCoT \ref{['alg:sampling-algorithm']} sampler.
  • Figure 5: 1-Step generation on the 30k training stage for ImageNet by SCoT \ref{['alg:sampling-algorithm']} sampler.
  • ...and 1 more figures