Table of Contents
Fetching ...

Input Perturbation Reduces Exposure Bias in Diffusion Models

Mang Ning, Enver Sangineto, Angelo Porrello, Simone Calderara, Rita Cucchiara

TL;DR

The paper identifies exposure-bias-like error accumulation in long diffusion sampling chains and introduces a simple, effective training regularization called DDPM-IP that perturbs ground-truth inputs to mimic inference-time errors. This input perturbation smooths the learned denoising function, improving sample quality while speeding up both training and inference. Empirical results on multiple datasets show substantial FID/sFID gains, faster convergence, and notable inference acceleration, with recall/precision largely preserved. The approach is architecture-agnostic, easily plug-in, and complementary to existing acceleration techniques, marking a practical advance for diffusion-based generative modeling.

Abstract

Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP

Input Perturbation Reduces Exposure Bias in Diffusion Models

TL;DR

The paper identifies exposure-bias-like error accumulation in long diffusion sampling chains and introduces a simple, effective training regularization called DDPM-IP that perturbs ground-truth inputs to mimic inference-time errors. This input perturbation smooths the learned denoising function, improving sample quality while speeding up both training and inference. Empirical results on multiple datasets show substantial FID/sFID gains, faster convergence, and notable inference acceleration, with recall/precision largely preserved. The approach is architecture-agnostic, easily plug-in, and complementary to existing acceleration techniques, marking a practical advance for diffusion-based generative modeling.

Abstract

Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 6464, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP
Paper Structure (23 sections, 20 equations, 16 figures, 10 tables, 4 algorithms)

This paper contains 23 sections, 20 equations, 16 figures, 10 tables, 4 algorithms.

Figures (16)

  • Figure 1: The inputs and the prediction targets are different in vanilla DDPM, DDPM-IP and DDPM-$y$.
  • Figure 2: The inference time standard deviation $\nu_{t}$ of the prediction error of a pre-trained network with respect to the sampling step $t$. The mean of the blue and the orange curve is 0.20 and 0.19, respectively.
  • Figure 3: FID scores with respect to the number of training iterations. Each FID value is computed using $T' =1,000$ inference sampling steps, except for the FFHQ dataset, for which we used $T' =100$.
  • Figure 4: CIFAR10: FID scores with respect to the number of training iterations with different $\gamma$ values. Each FID score is computed using $T' = 100$ inference sampling steps.
  • Figure 5: Visualization of the exposure bias problem with different diffusion chain lengths.
  • ...and 11 more figures