Table of Contents
Fetching ...

Blue noise for diffusion models

Xingchang Huang, Corentin Salaün, Cristina Vasconcelos, Christian Theobalt, Cengiz Öztireli, Gurprit Singh

TL;DR

The paper addresses a fundamental mismatch in diffusion models by introducing correlated blue-noise, time-varying noise during training and sampling. It develops a deterministic diffusion framework with forward noise $x_t = oldsymbol{}_t (L_t oldsymbol{}) + (1-oldsymbol{}_t) x_0$ and a backward path that learns two terms, along with a real-time method to generate Gaussian blue-noise masks via $b = L oldsymbol{}$ where $LL^{ op} = oldsymbol{}$. A rectified minibatch mapping further improves gradient flow by aligning noise with data samples. Empirical results on CelebA, AFHQ-Cat, and LSUN demonstrate improved FID and perceptual quality over IADB and competitive performance with DDIM, with extensive ablations highlighting the benefits and trade-offs of blue-noise components and time-varying mixing. Limitations include resolution-dependent scheduler tuning and mask-generation cost, suggesting future work on broader noise types, stochastic variants, and extensions to video or 3D data.

Abstract

Most of the existing diffusion models use Gaussian noise for training and sampling across all time steps, which may not optimally account for the frequency contents reconstructed by the denoising network. Despite the diverse applications of correlated noise in computer graphics, its potential for improving the training process has been underexplored. In this paper, we introduce a novel and general class of diffusion models taking correlated noise within and across images into account. More specifically, we propose a time-varying noise model to incorporate correlated noise into the training process, as well as a method for fast generation of correlated noise mask. Our model is built upon deterministic diffusion models and utilizes blue noise to help improve the generation quality compared to using Gaussian white (random) noise only. Further, our framework allows introducing correlation across images within a single mini-batch to improve gradient flow. We perform both qualitative and quantitative evaluations on a variety of datasets using our method, achieving improvements on different tasks over existing deterministic diffusion models in terms of FID metric.

Blue noise for diffusion models

TL;DR

The paper addresses a fundamental mismatch in diffusion models by introducing correlated blue-noise, time-varying noise during training and sampling. It develops a deterministic diffusion framework with forward noise and a backward path that learns two terms, along with a real-time method to generate Gaussian blue-noise masks via where . A rectified minibatch mapping further improves gradient flow by aligning noise with data samples. Empirical results on CelebA, AFHQ-Cat, and LSUN demonstrate improved FID and perceptual quality over IADB and competitive performance with DDIM, with extensive ablations highlighting the benefits and trade-offs of blue-noise components and time-varying mixing. Limitations include resolution-dependent scheduler tuning and mask-generation cost, suggesting future work on broader noise types, stochastic variants, and extensions to video or 3D data.

Abstract

Most of the existing diffusion models use Gaussian noise for training and sampling across all time steps, which may not optimally account for the frequency contents reconstructed by the denoising network. Despite the diverse applications of correlated noise in computer graphics, its potential for improving the training process has been underexplored. In this paper, we introduce a novel and general class of diffusion models taking correlated noise within and across images into account. More specifically, we propose a time-varying noise model to incorporate correlated noise into the training process, as well as a method for fast generation of correlated noise mask. Our model is built upon deterministic diffusion models and utilizes blue noise to help improve the generation quality compared to using Gaussian white (random) noise only. Further, our framework allows introducing correlation across images within a single mini-batch to improve gradient flow. We perform both qualitative and quantitative evaluations on a variety of datasets using our method, achieving improvements on different tasks over existing deterministic diffusion models in terms of FID metric.
Paper Structure (23 sections, 7 equations, 12 figures, 3 tables, 3 algorithms)

This paper contains 23 sections, 7 equations, 12 figures, 3 tables, 3 algorithms.

Figures (12)

  • Figure 1: Schema of diffusion process using our time-varying noise. The diffusion transforms initial noise distribution (blue) into the target data distribution (red). Five examples are shown with the intermediates diffusion steps between the two distributions. For one of the data we illustrate the intermediates time steps with the current expect result and noise. The evolution of the noise from random to blue noise is visible as well as the quality of the expected result.
  • Figure 2: To generate on-the-fly Gaussian blue noise masks $\TextOrMath{$b$\xspace}{\mathbf{b}}$ (leftmost), we pre-compute a lower triangular matrix $\TextOrMath{$L$\xspace}{L}$. We then multiply this matrix with Gaussian noise $\TextOrMath{$ϵ$\xspace}{\bm{\epsilon}}$ to obtain $\TextOrMath{$b$\xspace}{\mathbf{b}}=\TextOrMath{$L$\xspace}{L} \TextOrMath{$ϵ$\xspace}{\bm{\epsilon}}$. For Gaussian noise of size $64\times64$, the lower triangular matrix has a size of $64^2 \times 64^2$. Here we show a $64 \times 64$ zoom-in version of the matrix to better visualize the positive and negative correlations shown as the white and black lines, respectively.
  • Figure 3: Visualization of linearly interpolated noises from Gaussian noise to Gaussian blue noise at resolution $64^2$ (top row), and the corresponding frequency power spectra (bottom row).
  • Figure 4: Visualization of the impact of rectified mapping on mini-batch noise-mapping pairing. Blue and red points respectively represent the randomly selected noise and target image sampled in a given mini-batch from the underlying blue and red distribution. Standard practice (a) consist in a random mapping between the noise and target images. Our rectified mapping (b) improved it by reducing distance between the data pair. One example of noise and target from the mini-batch is visible (c). This example have been generated using our noise mask and mapping algorithm.
  • Figure 5: Image super-resolution comparisons between IADB (SSIM/PSNR=0.57/19.46) and Ours (SSIM/PSNR=0.59/20.00) on LSUN-Church ($32^2 \rightarrow 128^2$). The mean squared error w.r.t the reference is visible in the upper corner with the relative error to IADB. Our method achieves lower error and more plausible details with less hallucination.
  • ...and 7 more figures