Dual-End Consistency Model
Linwei Dong, Ruoyu Guo, Ge Bai, Zehuan Yuan, Yawei Luo, Changqing Zou
TL;DR
This work addresses the slow, iterative nature of diffusion and flow-based generation by diagnosing two core shortcomings of consistency models: training instability and inflexible sampling. It introduces Dual-End Consistency Model (DE-CM), which decouples the PF-ODE trajectory into three key sub-trajectory clusters—consistency, instantaneous, and noise-to-noisy—and optimizes them with a combination of continuous-time consistency distillation, flow matching as a boundary regularizer, and a novel noise-to-noisy mapping. The method yields stable training, few-step distillation, and flexible sampling, achieving state-of-the-art results such as an $\text{FID}=1.70$ at $1$-NFE on ImageNet-$256\times256$, and demonstrates strong performance on both class-to-image and text-to-image tasks across various inference budgets. The work advances practical deployment of diffusion/flow models by enabling high-quality, low-NFE generation, though it notes memory constraints due to the Jacobian-vector product in large-scale, distributed setups.
Abstract
The slow iterative sampling nature remains a major bottleneck for the practical deployment of diffusion and flow-based generative models. While consistency models (CMs) represent a state-of-the-art distillation-based approach for efficient generation, their large-scale application is still limited by two key issues: training instability and inflexible sampling. Existing methods seek to mitigate these problems through architectural adjustments or regularized objectives, yet overlook the critical reliance on trajectory selection. In this work, we first conduct an analysis on these two limitations: training instability originates from loss divergence induced by unstable self-supervised term, whereas sampling inflexibility arises from error accumulation. Based on these insights and analysis, we propose the Dual-End Consistency Model (DE-CM) that selects vital sub-trajectory clusters to achieve stable and effective training. DE-CM decomposes the PF-ODE trajectory and selects three critical sub-trajectories as optimization targets. Specifically, our approach leverages continuous-time CMs objectives to achieve few-step distillation and utilizes flow matching as a boundary regularizer to stabilize the training process. Furthermore, we propose a novel noise-to-noisy (N2N) mapping that can map noise to any point, thereby alleviating the error accumulation in the first step. Extensive experimental results show the effectiveness of our method: it achieves a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset, outperforming existing CM-based one-step approaches.
