Table of Contents
Fetching ...

GUD: Generation with Unified Diffusion

Mathis Gerdes, Max Welling, Miranda C. N. Cheng

TL;DR

A unified framework for diffusion generative models with greatly enhanced design freedom is developed, introducing soft-conditioning models that smoothly interpolate between standard diffusion models and autoregressive models (in any basis), conceptually bridging these two approaches.

Abstract

Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates (e.g. pixel-, PCA-, Fourier-, or wavelet-basis), 2) the prior distribution that data is transformed into during diffusion (e.g. Gaussian with covariance $Σ$), and 3) the scheduling of noise levels applied separately to different parts of the data, captured by a component-wise noise schedule. Incorporating the flexibility in these choices, we develop a unified framework for diffusion generative models with greatly enhanced design freedom. In particular, we introduce soft-conditioning models that smoothly interpolate between standard diffusion models and autoregressive models (in any basis), conceptually bridging these two approaches. Our framework opens up a wide design space which may lead to more efficient training and data generation, and paves the way to novel architectures integrating different generative approaches and generation tasks.

GUD: Generation with Unified Diffusion

TL;DR

A unified framework for diffusion generative models with greatly enhanced design freedom is developed, introducing soft-conditioning models that smoothly interpolate between standard diffusion models and autoregressive models (in any basis), conceptually bridging these two approaches.

Abstract

Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates (e.g. pixel-, PCA-, Fourier-, or wavelet-basis), 2) the prior distribution that data is transformed into during diffusion (e.g. Gaussian with covariance ), and 3) the scheduling of noise levels applied separately to different parts of the data, captured by a component-wise noise schedule. Incorporating the flexibility in these choices, we develop a unified framework for diffusion generative models with greatly enhanced design freedom. In particular, we introduce soft-conditioning models that smoothly interpolate between standard diffusion models and autoregressive models (in any basis), conceptually bridging these two approaches. Our framework opens up a wide design space which may lead to more efficient training and data generation, and paves the way to novel architectures integrating different generative approaches and generation tasks.
Paper Structure (31 sections, 34 equations, 8 figures, 1 table)

This paper contains 31 sections, 34 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Different noising schedules $\boldsymbol \gamma(t)$.
  • Figure 2: For eight of the PCA components $\chi_i$ of CIFAR-10, we visualize the OU noise level ${\sigma}_i(t)$, the corresponding noising path ${\gamma}_i(t)={\rm logit}(\sigma_i^2(t))$ for the linear schedule of equation \ref{['eq:linear-schedule']}, and the corresponding signal-to-noise ratio. Blue dashed lines indicate chosen minimal noising/reconstruction levels. From left to right: (a) Standard diffusion where ${\gamma}_i={\gamma}_j$. (b) The schedule $\boldsymbol \gamma$ is chosen such that $\log{\rm SNR}_i(t)=\log{\rm SNR}_j(t)$, corresponding to a generative process with no hierarchy. (c) With whitened data, with the schedule $\boldsymbol \gamma$ chosen such that $\log{\rm SNR}_i(t)$ is identical to that in the standard diffusion case shown in column (a). (d) Hierarchy-less generation with whitened data.
  • Figure 3: Dependence of model quality in terms of negative log-likelihood (left) and FID (right) on the softness parameter for the linear schedule in § \ref{['sec:autoregressiveness']}. The schedule is defined in PCA components and results are shown both for unwhitened and whitened data scaling (i.e. white and data-matching priors). Training on CIFAR-10 using a single score-network for each choice of scaling. Standard diffusion corresponds to $a=1$ in the unwhitened case.
  • Figure 4: Diffusion forward process for a single image of CIFAR-10: (a) standard diffusion, (b) variance-matching Gaussian noise with same SNR as standard diffusion, (c) column-wise sequential schedule of § \ref{['sec:column']} with $b=0.5$, (d) combination of Haar wavelet and column-sequential schedule of § \ref{['sec:haar']} with $a=0.5$, and with variance-matching Gaussian noise.
  • Figure 5: The model quality with a two-parameter family of schedules controlling the softness and the ordering parameters. The right figure is the region in the box on the left and the same color map is shared. The black dots indicate the parameters corresponding to standard diffusion models.
  • ...and 3 more figures