Table of Contents
Fetching ...

Dual-End Consistency Model

Linwei Dong, Ruoyu Guo, Ge Bai, Zehuan Yuan, Yawei Luo, Changqing Zou

TL;DR

This work addresses the slow, iterative nature of diffusion and flow-based generation by diagnosing two core shortcomings of consistency models: training instability and inflexible sampling. It introduces Dual-End Consistency Model (DE-CM), which decouples the PF-ODE trajectory into three key sub-trajectory clusters—consistency, instantaneous, and noise-to-noisy—and optimizes them with a combination of continuous-time consistency distillation, flow matching as a boundary regularizer, and a novel noise-to-noisy mapping. The method yields stable training, few-step distillation, and flexible sampling, achieving state-of-the-art results such as an $\text{FID}=1.70$ at $1$-NFE on ImageNet-$256\times256$, and demonstrates strong performance on both class-to-image and text-to-image tasks across various inference budgets. The work advances practical deployment of diffusion/flow models by enabling high-quality, low-NFE generation, though it notes memory constraints due to the Jacobian-vector product in large-scale, distributed setups.

Abstract

The slow iterative sampling nature remains a major bottleneck for the practical deployment of diffusion and flow-based generative models. While consistency models (CMs) represent a state-of-the-art distillation-based approach for efficient generation, their large-scale application is still limited by two key issues: training instability and inflexible sampling. Existing methods seek to mitigate these problems through architectural adjustments or regularized objectives, yet overlook the critical reliance on trajectory selection. In this work, we first conduct an analysis on these two limitations: training instability originates from loss divergence induced by unstable self-supervised term, whereas sampling inflexibility arises from error accumulation. Based on these insights and analysis, we propose the Dual-End Consistency Model (DE-CM) that selects vital sub-trajectory clusters to achieve stable and effective training. DE-CM decomposes the PF-ODE trajectory and selects three critical sub-trajectories as optimization targets. Specifically, our approach leverages continuous-time CMs objectives to achieve few-step distillation and utilizes flow matching as a boundary regularizer to stabilize the training process. Furthermore, we propose a novel noise-to-noisy (N2N) mapping that can map noise to any point, thereby alleviating the error accumulation in the first step. Extensive experimental results show the effectiveness of our method: it achieves a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset, outperforming existing CM-based one-step approaches.

Dual-End Consistency Model

TL;DR

This work addresses the slow, iterative nature of diffusion and flow-based generation by diagnosing two core shortcomings of consistency models: training instability and inflexible sampling. It introduces Dual-End Consistency Model (DE-CM), which decouples the PF-ODE trajectory into three key sub-trajectory clusters—consistency, instantaneous, and noise-to-noisy—and optimizes them with a combination of continuous-time consistency distillation, flow matching as a boundary regularizer, and a novel noise-to-noisy mapping. The method yields stable training, few-step distillation, and flexible sampling, achieving state-of-the-art results such as an at -NFE on ImageNet-, and demonstrates strong performance on both class-to-image and text-to-image tasks across various inference budgets. The work advances practical deployment of diffusion/flow models by enabling high-quality, low-NFE generation, though it notes memory constraints due to the Jacobian-vector product in large-scale, distributed setups.

Abstract

The slow iterative sampling nature remains a major bottleneck for the practical deployment of diffusion and flow-based generative models. While consistency models (CMs) represent a state-of-the-art distillation-based approach for efficient generation, their large-scale application is still limited by two key issues: training instability and inflexible sampling. Existing methods seek to mitigate these problems through architectural adjustments or regularized objectives, yet overlook the critical reliance on trajectory selection. In this work, we first conduct an analysis on these two limitations: training instability originates from loss divergence induced by unstable self-supervised term, whereas sampling inflexibility arises from error accumulation. Based on these insights and analysis, we propose the Dual-End Consistency Model (DE-CM) that selects vital sub-trajectory clusters to achieve stable and effective training. DE-CM decomposes the PF-ODE trajectory and selects three critical sub-trajectories as optimization targets. Specifically, our approach leverages continuous-time CMs objectives to achieve few-step distillation and utilizes flow matching as a boundary regularizer to stabilize the training process. Furthermore, we propose a novel noise-to-noisy (N2N) mapping that can map noise to any point, thereby alleviating the error accumulation in the first step. Extensive experimental results show the effectiveness of our method: it achieves a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset, outperforming existing CM-based one-step approaches.
Paper Structure (17 sections, 18 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 18 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of FID scores across different models under various NFE settings, showing the superior performance of our method in both few-step and multi-step sampling.
  • Figure 2: Comparison of learning objectives. (a): CMs Distillation song2023consistency ($s = T$), Flow Matching lipman2022flow ($s = t$), MeanFlow geng2025mean / AYF sabour2025align (upper triangle). (b): DE-CM (triangular boundary).
  • Figure 3: We select significant trajectories from the whole $\{(t,s)|t<s \}$ space and treat these selected trajectories as our optimization targets. Specifically, we employ the continuous-time consistency distillation trajectory to optimize the mapping from arbitrary time points to data, thereby achieving few-step distillation. We leverage the proposed noise-to-noisy (N2N) mapping objective to eliminate the constraint of predicting only $x_1$ points. We utilize flow matching loss to overcome the instability in training. We find that incorporating these three trajectory clusters is enough to enable stable and efficient optimization for few-step distillation.
  • Figure 4: Comparison of gradient norm curves with and without flow matching boundary constraints.
  • Figure 5: Illustration of Dual-End sampling, including Euler ODE sampler, CM sampler and Mix sampler.
  • ...and 2 more figures