Table of Contents
Fetching ...

Elucidating the Preconditioning in Consistency Distillation

Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu

TL;DR

This work analyzes the design of preconditioning for consistency distillation in diffusion models, connecting preconditioning to discretizations of the teacher PF-ODE. It introduces Analytic-Precond, an analytically computed preconditioning that minimizes the consistency gap by selecting $l_t$ and $s_t$ from the generalized ODE framework, yielding more aligned trajectory jumpers. Empirically, Analytic-Precond delivers 2×–3× training acceleration for CTMs in multi-step generation and improves trajectory alignment, with comparable performance to prior preconditionings in CMs. The method is efficient to compute (sub-1% training time) and demonstrates strong speed-quality trade-offs across CIFAR-10, FFHQ-64, and ImageNet-64, while acknowledging modest final FID gains and potential misuse concerns of faster generative systems.

Abstract

Consistency distillation is a prevalent way for accelerating diffusion models adopted in consistency (trajectory) models, in which a student model is trained to traverse backward on the probability flow (PF) ordinary differential equation (ODE) trajectory determined by the teacher model. Preconditioning is a vital technique for stabilizing consistency distillation, by linear combining the input data and the network output with pre-defined coefficients as the consistency function. It imposes the boundary condition of consistency functions without restricting the form and expressiveness of the neural network. However, previous preconditionings are hand-crafted and may be suboptimal choices. In this work, we offer the first theoretical insights into the preconditioning in consistency distillation, by elucidating its design criteria and the connection to the teacher ODE trajectory. Based on these analyses, we further propose a principled way dubbed \textit{Analytic-Precond} to analytically optimize the preconditioning according to the consistency gap (defined as the gap between the teacher denoiser and the optimal student denoiser) on a generalized teacher ODE. We demonstrate that Analytic-Precond can facilitate the learning of trajectory jumpers, enhance the alignment of the student trajectory with the teacher's, and achieve $2\times$ to $3\times$ training acceleration of consistency trajectory models in multi-step generation across various datasets.

Elucidating the Preconditioning in Consistency Distillation

TL;DR

This work analyzes the design of preconditioning for consistency distillation in diffusion models, connecting preconditioning to discretizations of the teacher PF-ODE. It introduces Analytic-Precond, an analytically computed preconditioning that minimizes the consistency gap by selecting and from the generalized ODE framework, yielding more aligned trajectory jumpers. Empirically, Analytic-Precond delivers 2×–3× training acceleration for CTMs in multi-step generation and improves trajectory alignment, with comparable performance to prior preconditionings in CMs. The method is efficient to compute (sub-1% training time) and demonstrates strong speed-quality trade-offs across CIFAR-10, FFHQ-64, and ImageNet-64, while acknowledging modest final FID gains and potential misuse concerns of faster generative systems.

Abstract

Consistency distillation is a prevalent way for accelerating diffusion models adopted in consistency (trajectory) models, in which a student model is trained to traverse backward on the probability flow (PF) ordinary differential equation (ODE) trajectory determined by the teacher model. Preconditioning is a vital technique for stabilizing consistency distillation, by linear combining the input data and the network output with pre-defined coefficients as the consistency function. It imposes the boundary condition of consistency functions without restricting the form and expressiveness of the neural network. However, previous preconditionings are hand-crafted and may be suboptimal choices. In this work, we offer the first theoretical insights into the preconditioning in consistency distillation, by elucidating its design criteria and the connection to the teacher ODE trajectory. Based on these analyses, we further propose a principled way dubbed \textit{Analytic-Precond} to analytically optimize the preconditioning according to the consistency gap (defined as the gap between the teacher denoiser and the optimal student denoiser) on a generalized teacher ODE. We demonstrate that Analytic-Precond can facilitate the learning of trajectory jumpers, enhance the alignment of the student trajectory with the teacher's, and achieve to training acceleration of consistency trajectory models in multi-step generation across various datasets.

Paper Structure

This paper contains 29 sections, 1 theorem, 28 equations, 8 figures, 4 tables.

Key Result

Proposition 3.1

Suppose there exists some constant $C>0$ so that the parameters $\{l_t,s_t\}_{t=\epsilon}^T$ are bounded by $|l_t|,|s_t|\leq C$, then the optimal student denoiser function $\bm{D}_{\theta^*}$ under the preconditioning $f(t,s)=\frac{L_tS_t+(l_t-1)(\eta_s-\eta_t)}{L_sS_t}, g(t,s)=\frac{\eta_s-\eta_t}{

Figures (8)

  • Figure 1: Consistency distillation with preconditioning coefficients $\alpha,\beta$.
  • Figure 2: Training curves for single-step generation, and visualization of preconditionings for single-step jump on CIFAR-10 (conditional).
  • Figure 3: Training curves for two-step generation.
  • Figure 4: Visualizations of the preconditioning coefficient $g(t,s)$ for CTM, and for Analytic-Precond under different datasets.
  • Figure 5: Effects of BCM's preconditioning on CTMs.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Proposition 3.1: Bound for the Consistency Gap, proof in Appendix \ref{['appendix:proof1']}
  • proof