S$^{2}$-DMs:Skip-Step Diffusion Models
Yixuan Wang, Shuangyin Li
TL;DR
Diffusion models suffer when accelerated sampling omits intermediate steps, creating a training–sampling mismatch that can degrade sample quality. S$^2$-DMs address this by adding a skip-step loss $L_{skip}$ to the standard loss $L_0$, forming a balanced objective $L=\tau L_0 + (1-\tau) L_{skip}$, with $L_{skip}$ derived to align skipped-step predictions with the sampling trajectory via the term $\alpha_{skip}$. Empirically, S$^2$-DMs deliver consistent improvements over DDIMs, PNDMs, and DEIS on CIFAR10 and CelebA across various sampling steps, e.g., CIFAR10 gains from $3.27\%$ to $14.06\%$ and CelebA from $8.97\%$ to $27.08\%$ in FID, using the same sampling algorithms. The approach is simple to implement, requires only minor code changes, maintains compatibility with existing accelerated samplers, and supports high-quality sample generation with fewer steps and latent-space interpolation.
Abstract
Diffusion models have emerged as powerful generative tools, rivaling GANs in sample quality and mirroring the likelihood scores of autoregressive models. A subset of these models, exemplified by DDIMs, exhibit an inherent asymmetry: they are trained over $T$ steps but only sample from a subset of $T$ during generation. This selective sampling approach, though optimized for speed, inadvertently misses out on vital information from the unsampled steps, leading to potential compromises in sample quality. To address this issue, we present the S$^{2}$-DMs, which is a new training method by using an innovative $L_{skip}$, meticulously designed to reintegrate the information omitted during the selective sampling phase. The benefits of this approach are manifold: it notably enhances sample quality, is exceptionally simple to implement, requires minimal code modifications, and is flexible enough to be compatible with various sampling algorithms. On the CIFAR10 dataset, models trained using our algorithm showed an improvement of 3.27% to 14.06% over models trained with traditional methods across various sampling algorithms (DDIMs, PNDMs, DEIS) and different numbers of sampling steps (10, 20, ..., 1000). On the CELEBA dataset, the improvement ranged from 8.97% to 27.08%. Access to the code and additional resources is provided in the github.
