Table of Contents
Fetching ...

Elucidating the Exposure Bias in Diffusion Models

Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, Itir Onal Ertugrul

TL;DR

This work formalizes exposure bias in diffusion models by analytically characterizing the mismatch between training and sampling distributions and introducing a variance-based metric delta_t. It identifies prediction error in the denoiser as the root cause of sampling drift and proposes a simple, training-free remedy, Epsilon Scaling, to nudge the sampling trajectory toward the learned vector field. Across multiple architectures and sampling schemes, the method consistently improves FID and reduces exposure bias, demonstrating its broad applicability from ADM and EDM to LDM and DiT. The approach offers a practical, low-overhead path to higher-quality generations without retraining, with code and schedules provided for reproducibility.

Abstract

Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output, mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, DiT, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. The code is available at \url{https://github.com/forever208/ADM-ES} and \url{https://github.com/forever208/EDM-ES}.

Elucidating the Exposure Bias in Diffusion Models

TL;DR

This work formalizes exposure bias in diffusion models by analytically characterizing the mismatch between training and sampling distributions and introducing a variance-based metric delta_t. It identifies prediction error in the denoiser as the root cause of sampling drift and proposes a simple, training-free remedy, Epsilon Scaling, to nudge the sampling trajectory toward the learned vector field. Across multiple architectures and sampling schemes, the method consistently improves FID and reduces exposure bias, demonstrating its broad applicability from ADM and EDM to LDM and DiT. The approach offers a practical, low-overhead path to higher-quality generations without retraining, with code and schedules provided for reproducibility.

Abstract

Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output, mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, DiT, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. The code is available at \url{https://github.com/forever208/ADM-ES} and \url{https://github.com/forever208/EDM-ES}.
Paper Structure (33 sections, 21 equations, 15 figures, 19 tables, 3 algorithms)

This paper contains 33 sections, 21 equations, 15 figures, 19 tables, 3 algorithms.

Figures (15)

  • Figure 1: Variance error in single-step and multi-step samplings.
  • Figure 2: Expectation of $\left\| \pmb{\epsilon_{\theta}}(\cdot) \right\|_2$ during training and 20-step sampling on CIFAR-10. We report the L2-norm using 50k samples at each timestep.
  • Figure 3: $\Delta N (t)$ at each timestep.
  • Figure 4: Exposure bias measured by $\delta_t$ on LSUN 64$\times$64. Epsilon Scaling achieves a smaller $\delta_t$ at the end of sampling ($t=1$)
  • Figure 5: $\left\| \pmb{\epsilon_{\theta}}(\cdot) \right\|_2$ on LSUN 64$\times$64. After applying Epsilon Scaling, the sampling $\left\| \pmb{\epsilon}_{\pmb{\theta}} \right\|_2$ (blue) gets closer to the training $\left\| \pmb{\epsilon}_{\pmb{\theta}} \right\|_2$ (red).
  • ...and 10 more figures