Table of Contents
Fetching ...

S$^{2}$-DMs:Skip-Step Diffusion Models

Yixuan Wang, Shuangyin Li

TL;DR

Diffusion models suffer when accelerated sampling omits intermediate steps, creating a training–sampling mismatch that can degrade sample quality. S$^2$-DMs address this by adding a skip-step loss $L_{skip}$ to the standard loss $L_0$, forming a balanced objective $L=\tau L_0 + (1-\tau) L_{skip}$, with $L_{skip}$ derived to align skipped-step predictions with the sampling trajectory via the term $\alpha_{skip}$. Empirically, S$^2$-DMs deliver consistent improvements over DDIMs, PNDMs, and DEIS on CIFAR10 and CelebA across various sampling steps, e.g., CIFAR10 gains from $3.27\%$ to $14.06\%$ and CelebA from $8.97\%$ to $27.08\%$ in FID, using the same sampling algorithms. The approach is simple to implement, requires only minor code changes, maintains compatibility with existing accelerated samplers, and supports high-quality sample generation with fewer steps and latent-space interpolation.

Abstract

Diffusion models have emerged as powerful generative tools, rivaling GANs in sample quality and mirroring the likelihood scores of autoregressive models. A subset of these models, exemplified by DDIMs, exhibit an inherent asymmetry: they are trained over $T$ steps but only sample from a subset of $T$ during generation. This selective sampling approach, though optimized for speed, inadvertently misses out on vital information from the unsampled steps, leading to potential compromises in sample quality. To address this issue, we present the S$^{2}$-DMs, which is a new training method by using an innovative $L_{skip}$, meticulously designed to reintegrate the information omitted during the selective sampling phase. The benefits of this approach are manifold: it notably enhances sample quality, is exceptionally simple to implement, requires minimal code modifications, and is flexible enough to be compatible with various sampling algorithms. On the CIFAR10 dataset, models trained using our algorithm showed an improvement of 3.27% to 14.06% over models trained with traditional methods across various sampling algorithms (DDIMs, PNDMs, DEIS) and different numbers of sampling steps (10, 20, ..., 1000). On the CELEBA dataset, the improvement ranged from 8.97% to 27.08%. Access to the code and additional resources is provided in the github.

S$^{2}$-DMs:Skip-Step Diffusion Models

TL;DR

Diffusion models suffer when accelerated sampling omits intermediate steps, creating a training–sampling mismatch that can degrade sample quality. S-DMs address this by adding a skip-step loss to the standard loss , forming a balanced objective , with derived to align skipped-step predictions with the sampling trajectory via the term . Empirically, S-DMs deliver consistent improvements over DDIMs, PNDMs, and DEIS on CIFAR10 and CelebA across various sampling steps, e.g., CIFAR10 gains from to and CelebA from to in FID, using the same sampling algorithms. The approach is simple to implement, requires only minor code changes, maintains compatibility with existing accelerated samplers, and supports high-quality sample generation with fewer steps and latent-space interpolation.

Abstract

Diffusion models have emerged as powerful generative tools, rivaling GANs in sample quality and mirroring the likelihood scores of autoregressive models. A subset of these models, exemplified by DDIMs, exhibit an inherent asymmetry: they are trained over steps but only sample from a subset of during generation. This selective sampling approach, though optimized for speed, inadvertently misses out on vital information from the unsampled steps, leading to potential compromises in sample quality. To address this issue, we present the S-DMs, which is a new training method by using an innovative , meticulously designed to reintegrate the information omitted during the selective sampling phase. The benefits of this approach are manifold: it notably enhances sample quality, is exceptionally simple to implement, requires minimal code modifications, and is flexible enough to be compatible with various sampling algorithms. On the CIFAR10 dataset, models trained using our algorithm showed an improvement of 3.27% to 14.06% over models trained with traditional methods across various sampling algorithms (DDIMs, PNDMs, DEIS) and different numbers of sampling steps (10, 20, ..., 1000). On the CELEBA dataset, the improvement ranged from 8.97% to 27.08%. Access to the code and additional resources is provided in the github.
Paper Structure (15 sections, 12 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 15 sections, 12 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Overview of Different Sampling Modes in Diffusion Models. Sampling is generally divided into step-by-step sampling and skip sampling. However, all diffusion model training processes are conducted step by step, so skip sampling can lead to the loss of intermediate information, resulting in poor sample quality.
  • Figure 2: The directed graphical model of the S$^2$-DMs.
  • Figure 3: Overview of reverse process, comparing the step by step models (DDPMs, etc) and the skip models (DDIMs, PNDMs, DEIs, etc), and our S$^2$-DM trained method. The step by step models predict noise at each step for sampling, whereas the skip models accelerate the process by using current noise predictions for next steps later. Our model trained by S$^2$-DM method integrates skip-step information during training and predicts current noise like the skip models. However, the noise predicted by our model more closely resembles what the stpe by step models would predict next steps ahead, reducing the gap and enhancing sampling quality.
  • Figure 4: FID scores for the step ablation on CIFAR10 and CelebA. The impact of skip steps on the model was examined by varying the skip values among {50, 10, 2} based on DDIMs.
  • Figure 5: Visualization of efficiency-effectiveness analysis.
  • ...and 2 more figures