Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun
TL;DR
This work tackles the slow sampling speed of diffusion models by introducing Distribution Backtracking Distillation (DisBack), which exploits the entire convergence trajectory between a teacher diffusion model and a student generator. DisBack consists of a Degradation Recording stage that builds a degradation path from the teacher to the initial student, and a Distribution Backtracking stage that reverses this path to guide the student along the teacher’s convergence trajectory, significantly accelerating distillation. Empirical results across CIFAR10, FFHQ-64, ImageNet-64, and text-to-image tasks show that DisBack achieves faster convergence while maintaining or improving generation quality, with substantial improvements over baseline score-distillation methods. The method is simple to implement, orthogonal to existing distillation strategies, and supported by public code, making it broadly applicable to accelerate diffusion-based one-step generation. Overall, DisBack provides a principled, trajectory-aware alternative to endpoint-only distillation, enabling practical high-quality, fast diffusion-based generation in diverse settings.
Abstract
Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into a student generator to achieve one-step generation, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process, because existing methods mainly focus on using the endpoint of pre-trained diffusion models as teacher models, overlooking the importance of the convergence trajectory between the student generator and the teacher model. To address this issue, we extend the score distillation process by introducing the entire convergence trajectory of teacher models and propose Distribution Backtracking Distillation (DisBack). DisBask is composed of two stages: Degradation Recording and Distribution Backtracking. Degradation Recording is designed to obtain the convergence trajectory of the teacher model, which records the degradation path from the trained teacher model to the untrained initial student generator. The degradation path implicitly represents the teacher model's intermediate distributions, and its reverse can be viewed as the convergence trajectory from the student generator to the teacher model. Then Distribution Backtracking trains a student generator to backtrack the intermediate distributions along the path to approximate the convergence trajectory of teacher models. Extensive experiments show that DisBack achieves faster and better convergence than the existing distillation method and accomplishes comparable generation performance, with FID score of 1.38 on ImageNet 64x64 dataset. Notably, DisBack is easy to implement and can be generalized to existing distillation methods to boost performance. Our code is publicly available on https://github.com/SYZhang0805/DisBack.
