DDIL: Diversity Enhancing Diffusion Distillation With Imitation Learning
Risheek Garrepalli, Shweta Mahajan, Munawar Hayat, Fatih Porikli
TL;DR
The paper tackles slow sampling in diffusion models and the quality-diversity trade-off observed in multi-step distillation, identifying covariate shift as a key bottleneck. It introduces Diffusion Distillation with Imitation Learning (DDIL), a framework that preserves marginal data distribution by training on forward diffusion $p_{data}$ and mitigates compounding errors by incorporating backward trajectories from both teacher and student, aided by a reflected diffusion formulation for stability. DDIL unifies with existing distillation approaches (PD, LCM, DMD2) via mixed rollouts and a dataset-aggregation–style feedback mechanism, implemented with a lightweight replay buffer. Empirically, DDIL improves sample fidelity (e.g., FID) and diversity (Density-Coverage, LPIPS-Diversity) across baselines on COCO datasets, while maintaining or reducing training overhead, and it provides insight into covariate shift through targeted analyses. The work demonstrates that integrating imitation-learning principles with diffusion distillation yields robust, scalable gains in both quality and diversity, with practical impact for efficient, high-quality text-to-image generation.
Abstract
Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes, limiting practicality. Efforts such as progressive distillation or consistency distillation have shown promise by reducing the number of passes at the expense of quality of the generated samples. In this work we identify co-variate shift as one of reason for poor performance of multi-step distilled models from compounding error at inference time. To address co-variate shift, we formulate diffusion distillation within imitation learning (DDIL) framework and enhance training distribution for distilling diffusion models on both data distribution (forward diffusion) and student induced distributions (backward diffusion). Training on data distribution helps to diversify the generations by preserving marginal data distribution and training on student distribution addresses compounding error by correcting covariate shift. In addition, we adopt reflected diffusion formulation for distillation and demonstrate improved performance, stable training across different distillation methods. We show that DDIL consistency improves on baseline algorithms of progressive distillation (PD), Latent consistency models (LCM) and Distribution Matching Distillation (DMD2).
