Table of Contents
Fetching ...

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance

Cunzheng Wang, Ziyuan Guo, Yuxuan Duan, Huaxia Li, Nemo Chen, Xu Tang, Yao Hu

TL;DR

Diffusion models offer high-quality generation but are slow; existing consistency-distillation methods struggle with blur and detail loss due to naive target-timestep choices. Target-Driven Distillation (TDD) refines this by selecting a restricted, strategically chosen set of target timesteps, using decoupled guidance during training, and enabling non-equidistant sampling and x0 clipping to improve few-step results. The approach yields state-of-the-art performance in 4–8 step generation across standard benchmarks, with robust guidance-scale tuning and stable training. This combination provides a practical pathway to fast, high-fidelity diffusion generation applicable to large-scale models and diverse prompts.

Abstract

Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TDD), which (1) adopts a delicate selection strategy of target timesteps, increasing the training efficiency; (2) utilizes decoupled guidances during training, making TDD open to post-tuning on guidance scale during inference periods; (3) can be optionally equipped with non-equidistant sampling and x0 clipping, enabling a more flexible and accurate way for image sampling. Experiments verify that TDD achieves state-of-the-art performance in few-step generation, offering a better choice among consistency distillation models.

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance

TL;DR

Diffusion models offer high-quality generation but are slow; existing consistency-distillation methods struggle with blur and detail loss due to naive target-timestep choices. Target-Driven Distillation (TDD) refines this by selecting a restricted, strategically chosen set of target timesteps, using decoupled guidance during training, and enabling non-equidistant sampling and x0 clipping to improve few-step results. The approach yields state-of-the-art performance in 4–8 step generation across standard benchmarks, with robust guidance-scale tuning and stable training. This combination provides a practical pathway to fast, high-fidelity diffusion generation applicable to large-scale models and diverse prompts.

Abstract

Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TDD), which (1) adopts a delicate selection strategy of target timesteps, increasing the training efficiency; (2) utilizes decoupled guidances during training, making TDD open to post-tuning on guidance scale during inference periods; (3) can be optionally equipped with non-equidistant sampling and x0 clipping, enabling a more flexible and accurate way for image sampling. Experiments verify that TDD achieves state-of-the-art performance in few-step generation, offering a better choice among consistency distillation models.
Paper Structure (23 sections, 18 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 18 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of different distillation methods. $\tau^{k_1}_m$ and $\tau^{k_2}_m$ represent a target timestep when divided into $k_1$ and $k_2$, respectively. LCM (a) and PCM (b) are examples of single-target distillation, where $\mathbf{x}_{t_{n}}$ corresponds to only one target timestep. In contrast, CTM (c) and ours (d) are multi-target distillation methods, where $\mathbf{x}_{t_{n}}$ can correspond to multiple target timesteps.
  • Figure 2: Illustration of TDD distillation training and sampling processes. Fig (a) shows the distillation process, where $\tau^{k}$ represents equidistant timestep within segments. Fig (b) compares non-equidistant sampling with standard sampling for 5-step inference.
  • Figure 3: Qualitative comparison of different methods under NEF for 4 to 8 steps.
  • Figure 4: Qualitative comparison between Target-Driven Multi-Target Distillation (TDD, 4-8 step target timesteps distillation) and Single-Target Distillation (PCM, 4-step target timesteps distillation).
  • Figure 5: (a) Ablation comparison of distillation with decoupled guidances. (b) Ablation comparison of guidance scale tuning.
  • ...and 1 more figures