Table of Contents
Fetching ...

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Mingxiao Li, Tingyu Qu, Ruicong Yao, Wei Sun, Marie-Francine Moens

TL;DR

This work addresses exposure bias in diffusion probabilistic models by proposing a training-free Time-Shift Sampler (TS) that adaptively shifts the next denoising time step $t_s$ within a window around the current step $t-1$ during sampling, with a cutoff $t_c$ to bound shifts. The authors derive a variance-based criterion for the optimal $t_s$ and demonstrate how to integrate TS into existing samplers (DDPM, DDIM, S-PNDM, F-PNDM) with minimal overhead. Empirically, TS yields consistent and substantial improvements in FID on CIFAR-10 and CelebA across multiple backbones and step counts, including achieving $FID=3.88$ at 10 steps for F-PNDM on CIFAR-10, outperforming the vanilla 100-step DDIM in some cases. The approach avoids retraining and remains broadly compatible with strong high-order solvers, marking a practical advance for efficient, high-quality image generation. TS also shows favorable comparisons against training-based ADM-IP retraining, highlighting its practical impact for real-world diffusion-model deployments.

Abstract

Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DPM. In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce a sampling method named Time-Shift Sampler. Our framework can be seamlessly integrated to existing sampling algorithms, such as DDPM, DDIM and other high-order solvers, inducing merely minimal additional computations. Experimental results show our method brings significant and consistent improvements in FID scores on different datasets and sampling methods. For example, integrating Time-Shift Sampler to F-PNDM yields a FID=3.88, achieving 44.49\% improvements as compared to F-PNDM, on CIFAR-10 with 10 sampling steps, which is more performant than the vanilla DDIM with 100 sampling steps. Our code is available at https://github.com/Mingxiao-Li/TS-DPM.

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

TL;DR

This work addresses exposure bias in diffusion probabilistic models by proposing a training-free Time-Shift Sampler (TS) that adaptively shifts the next denoising time step within a window around the current step during sampling, with a cutoff to bound shifts. The authors derive a variance-based criterion for the optimal and demonstrate how to integrate TS into existing samplers (DDPM, DDIM, S-PNDM, F-PNDM) with minimal overhead. Empirically, TS yields consistent and substantial improvements in FID on CIFAR-10 and CelebA across multiple backbones and step counts, including achieving at 10 steps for F-PNDM on CIFAR-10, outperforming the vanilla 100-step DDIM in some cases. The approach avoids retraining and remains broadly compatible with strong high-order solvers, marking a practical advance for efficient, high-quality image generation. TS also shows favorable comparisons against training-based ADM-IP retraining, highlighting its practical impact for real-world diffusion-model deployments.

Abstract

Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DPM. In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step and corresponding state , there might exist another time step which exhibits superior coupling with . Based on this finding, we introduce a sampling method named Time-Shift Sampler. Our framework can be seamlessly integrated to existing sampling algorithms, such as DDPM, DDIM and other high-order solvers, inducing merely minimal additional computations. Experimental results show our method brings significant and consistent improvements in FID scores on different datasets and sampling methods. For example, integrating Time-Shift Sampler to F-PNDM yields a FID=3.88, achieving 44.49\% improvements as compared to F-PNDM, on CIFAR-10 with 10 sampling steps, which is more performant than the vanilla DDIM with 100 sampling steps. Our code is available at https://github.com/Mingxiao-Li/TS-DPM.
Paper Structure (26 sections, 24 equations, 21 figures, 10 tables, 3 algorithms)

This paper contains 26 sections, 24 equations, 21 figures, 10 tables, 3 algorithms.

Figures (21)

  • Figure 1: The comparison of TS-DDPM (ours) and DDPM. The orange and blue arrows denote the time-state coupling at each denoising step of TS-DDPM and DDPM, respectively. In TS-DDPM, we search for coupled time step within the $[t-w/2, t+w/2]$ window, until the cutoff time step $t_c$.
  • Figure 2: The density distribution of the variance of 5000 samples from CIFAR-10 by different time steps.
  • Figure 3: CIFAR-10 prediction errors of training samples for different numbers of sampling steps.
  • Figure 4: The training and inference discrepancy of DDIM with 10 sampling steps on CIFAR-10. The dashed line in each column denotes the couple of predicted $\hat{x}_{t}$ and $t$. Points on the right side of the dashed line mean that the corresponding time steps couple better with $\hat{x}_{t}$ than time step $t$.
  • Figure 5: Sampling time VS FID on CIFAR-10 using DDPM as backbone with various sampling methods. We report the results of {5,10,20,50} sampling steps from left to right for each sampler, denoted with "$\times$" symbol.
  • ...and 16 more figures