Table of Contents
Fetching ...

Simple and Fast Distillation of Diffusion Models

Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, Siwei Lyu

TL;DR

This work proposes Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000$\times$.

Abstract

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000$\times$. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU. Our code is available at https://github.com/zju-pi/diff-sampler.

Simple and Fast Distillation of Diffusion Models

TL;DR

This work proposes Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000.

Abstract

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU. Our code is available at https://github.com/zju-pi/diff-sampler.
Paper Structure (21 sections, 7 equations, 18 figures, 10 tables, 12 algorithms)

This paper contains 21 sections, 7 equations, 18 figures, 10 tables, 12 algorithms.

Figures (18)

  • Figure 1: Comparison of acceleration methods on diffusion models. For better visualization, the time axis is shifted by adding one hour to the actual time required. Our method achieves good performance with a small fine-tuning cost. Note that it takes about 200 hours to train a diffusion model from scratch in this setting.
  • Figure 2: Comparison of synthesized images by Stable Diffusion v1.5 rombach2022ldm with guidance scale 7.5.
  • Figure 3: MODEL($\psi_n$) is trained to match the teacher's sampling trajectory at $t_n$ but can enhance the matching at untrained timestamps. The time schedule follows the polynomial schedule with $\rho=7, t_0=0.002, t_4=80$.
  • Figure 4: Ablation studies of 2-NFE distillation on CIFAR10. The FID is evaluated by 50,000 generated samples with the same latent encodings and is reported every 10 iterations. We achieve the best performance with SFD, DPM-Solver++(3M) teacher, AFS, $t_{\min}=0.006$ and L1 loss.
  • Figure 5: Ablation study on $t_{\min}$ with DPM++(3M).
  • ...and 13 more figures