Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps
Mingxiao Li, Tingyu Qu, Ruicong Yao, Wei Sun, Marie-Francine Moens
TL;DR
This work addresses exposure bias in diffusion probabilistic models by proposing a training-free Time-Shift Sampler (TS) that adaptively shifts the next denoising time step $t_s$ within a window around the current step $t-1$ during sampling, with a cutoff $t_c$ to bound shifts. The authors derive a variance-based criterion for the optimal $t_s$ and demonstrate how to integrate TS into existing samplers (DDPM, DDIM, S-PNDM, F-PNDM) with minimal overhead. Empirically, TS yields consistent and substantial improvements in FID on CIFAR-10 and CelebA across multiple backbones and step counts, including achieving $FID=3.88$ at 10 steps for F-PNDM on CIFAR-10, outperforming the vanilla 100-step DDIM in some cases. The approach avoids retraining and remains broadly compatible with strong high-order solvers, marking a practical advance for efficient, high-quality image generation. TS also shows favorable comparisons against training-based ADM-IP retraining, highlighting its practical impact for real-world diffusion-model deployments.
Abstract
Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DPM. In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce a sampling method named Time-Shift Sampler. Our framework can be seamlessly integrated to existing sampling algorithms, such as DDPM, DDIM and other high-order solvers, inducing merely minimal additional computations. Experimental results show our method brings significant and consistent improvements in FID scores on different datasets and sampling methods. For example, integrating Time-Shift Sampler to F-PNDM yields a FID=3.88, achieving 44.49\% improvements as compared to F-PNDM, on CIFAR-10 with 10 sampling steps, which is more performant than the vanilla DDIM with 100 sampling steps. Our code is available at https://github.com/Mingxiao-Li/TS-DPM.
