Table of Contents
Fetching ...

DDIL: Diversity Enhancing Diffusion Distillation With Imitation Learning

Risheek Garrepalli, Shweta Mahajan, Munawar Hayat, Fatih Porikli

TL;DR

The paper tackles slow sampling in diffusion models and the quality-diversity trade-off observed in multi-step distillation, identifying covariate shift as a key bottleneck. It introduces Diffusion Distillation with Imitation Learning (DDIL), a framework that preserves marginal data distribution by training on forward diffusion $p_{data}$ and mitigates compounding errors by incorporating backward trajectories from both teacher and student, aided by a reflected diffusion formulation for stability. DDIL unifies with existing distillation approaches (PD, LCM, DMD2) via mixed rollouts and a dataset-aggregation–style feedback mechanism, implemented with a lightweight replay buffer. Empirically, DDIL improves sample fidelity (e.g., FID) and diversity (Density-Coverage, LPIPS-Diversity) across baselines on COCO datasets, while maintaining or reducing training overhead, and it provides insight into covariate shift through targeted analyses. The work demonstrates that integrating imitation-learning principles with diffusion distillation yields robust, scalable gains in both quality and diversity, with practical impact for efficient, high-quality text-to-image generation.

Abstract

Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes, limiting practicality. Efforts such as progressive distillation or consistency distillation have shown promise by reducing the number of passes at the expense of quality of the generated samples. In this work we identify co-variate shift as one of reason for poor performance of multi-step distilled models from compounding error at inference time. To address co-variate shift, we formulate diffusion distillation within imitation learning (DDIL) framework and enhance training distribution for distilling diffusion models on both data distribution (forward diffusion) and student induced distributions (backward diffusion). Training on data distribution helps to diversify the generations by preserving marginal data distribution and training on student distribution addresses compounding error by correcting covariate shift. In addition, we adopt reflected diffusion formulation for distillation and demonstrate improved performance, stable training across different distillation methods. We show that DDIL consistency improves on baseline algorithms of progressive distillation (PD), Latent consistency models (LCM) and Distribution Matching Distillation (DMD2).

DDIL: Diversity Enhancing Diffusion Distillation With Imitation Learning

TL;DR

The paper tackles slow sampling in diffusion models and the quality-diversity trade-off observed in multi-step distillation, identifying covariate shift as a key bottleneck. It introduces Diffusion Distillation with Imitation Learning (DDIL), a framework that preserves marginal data distribution by training on forward diffusion and mitigates compounding errors by incorporating backward trajectories from both teacher and student, aided by a reflected diffusion formulation for stability. DDIL unifies with existing distillation approaches (PD, LCM, DMD2) via mixed rollouts and a dataset-aggregation–style feedback mechanism, implemented with a lightweight replay buffer. Empirically, DDIL improves sample fidelity (e.g., FID) and diversity (Density-Coverage, LPIPS-Diversity) across baselines on COCO datasets, while maintaining or reducing training overhead, and it provides insight into covariate shift through targeted analyses. The work demonstrates that integrating imitation-learning principles with diffusion distillation yields robust, scalable gains in both quality and diversity, with practical impact for efficient, high-quality text-to-image generation.

Abstract

Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes, limiting practicality. Efforts such as progressive distillation or consistency distillation have shown promise by reducing the number of passes at the expense of quality of the generated samples. In this work we identify co-variate shift as one of reason for poor performance of multi-step distilled models from compounding error at inference time. To address co-variate shift, we formulate diffusion distillation within imitation learning (DDIL) framework and enhance training distribution for distilling diffusion models on both data distribution (forward diffusion) and student induced distributions (backward diffusion). Training on data distribution helps to diversify the generations by preserving marginal data distribution and training on student distribution addresses compounding error by correcting covariate shift. In addition, we adopt reflected diffusion formulation for distillation and demonstrate improved performance, stable training across different distillation methods. We show that DDIL consistency improves on baseline algorithms of progressive distillation (PD), Latent consistency models (LCM) and Distribution Matching Distillation (DMD2).

Paper Structure

This paper contains 26 sections, 2 equations, 7 figures, 10 tables, 3 algorithms.

Figures (7)

  • Figure 1: DDIL consistently improves both sample quality (FID@30k) and diversity(Coverage) across distillation approaches. Performance shown for DMD2 applied to SSD1B, and Consistency/Progressive Distillation applied to SDv1.5. Coverage measures the extent to which the generated samples span the real data manifold.
  • Figure 2: Qualitative comparison of images generated with different distillation techniques.We can observe more coherent structure with DDIL compared to baselines DMD2 even with thresholding e.g., space station structure or motorcycle structure. All distilled models are trained on same dataset, batch size and evaluated on same seed and hence generations share characteristics
  • Figure 3: Preditions at different timesteps for different distillation frameworks:(a) We demonstrate standard progressive distillation training framework where student always sees forward diffused latent. (b) We show unrolling within our framework which in addition to (a) also obtains distillation feedback by querying teacher (green) on backward trajectory.
  • Figure 4: Qualitative comparison of images generated with different distillation techniques. We can observe that DDIL improves progressive distillation(PD), for e.g., we can observe 'astronaut' slightly disfigured in case of PD but DDIL(+PD) quality is good.
  • Figure 5: Sensitivity of timestep in reverse process
  • ...and 2 more figures