Table of Contents
Fetching ...

UDPM: Upsampling Diffusion Probabilistic Models

Shady Abu-Hussein, Raja Giryes

TL;DR

Denoising Diffusion Probabilistic Models (DDPMs) achieve high-fidelity image synthesis but require many diffusion steps, making sampling costly. UDPM generalizes this framework by inserting a downsampling operator $\mathcal{H}$ into the forward process, shrinking latent dimensions as diffusion proceeds and enabling a reverse denoising/upsampling path. With $L=3$ steps, UDPM delivers competitive image quality at roughly 30% the cost of a single DDPM step, across FFHQ, AFHQv2, and CIFAR-10, and provides an interpretable, interpolable latent space. The method relies on an ELBO objective with Gaussian posteriors, a network $f_\theta$ predicting $\mathcal{H}^{l-1}\mathbf{x}_0$, and FFT-based efficient computations, supporting both unconditional and class-conditioned generation. Overall, UDPM offers a practical, faster diffusion-based generator with latent controllability and potential for editing and data-efficient synthesis.

Abstract

Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs compose a Markovian process that begins in the data domain and gradually adds noise until reaching pure white noise. DDPMs generate high-quality samples from complex data distributions by defining an inverse process and training a deep neural network to learn this mapping. However, these models are inefficient because they require many diffusion steps to produce aesthetically pleasing samples. Additionally, unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM). In the forward process, we reduce the latent variable dimension through downsampling, followed by the traditional noise perturbation. As a result, the reverse process gradually denoises and upsamples the latent variable to produce a sample from the data distribution. We formalize the Markovian diffusion processes of UDPM and demonstrate its generation capabilities on the popular FFHQ, AFHQv2, and CIFAR10 datasets. UDPM generates images with as few as three network evaluations, whose overall computational cost is less than a single DDPM or EDM step, while achieving an FID score of 6.86. This surpasses current state-of-the-art efficient diffusion models that use a single denoising step for sampling. Additionally, UDPM offers an interpretable and interpolable latent space, which gives it an advantage over traditional DDPMs. Our code is available online: \url{https://github.com/shadyabh/UDPM/}

UDPM: Upsampling Diffusion Probabilistic Models

TL;DR

Denoising Diffusion Probabilistic Models (DDPMs) achieve high-fidelity image synthesis but require many diffusion steps, making sampling costly. UDPM generalizes this framework by inserting a downsampling operator into the forward process, shrinking latent dimensions as diffusion proceeds and enabling a reverse denoising/upsampling path. With steps, UDPM delivers competitive image quality at roughly 30% the cost of a single DDPM step, across FFHQ, AFHQv2, and CIFAR-10, and provides an interpretable, interpolable latent space. The method relies on an ELBO objective with Gaussian posteriors, a network predicting , and FFT-based efficient computations, supporting both unconditional and class-conditioned generation. Overall, UDPM offers a practical, faster diffusion-based generator with latent controllability and potential for editing and data-efficient synthesis.

Abstract

Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs compose a Markovian process that begins in the data domain and gradually adds noise until reaching pure white noise. DDPMs generate high-quality samples from complex data distributions by defining an inverse process and training a deep neural network to learn this mapping. However, these models are inefficient because they require many diffusion steps to produce aesthetically pleasing samples. Additionally, unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM). In the forward process, we reduce the latent variable dimension through downsampling, followed by the traditional noise perturbation. As a result, the reverse process gradually denoises and upsamples the latent variable to produce a sample from the data distribution. We formalize the Markovian diffusion processes of UDPM and demonstrate its generation capabilities on the popular FFHQ, AFHQv2, and CIFAR10 datasets. UDPM generates images with as few as three network evaluations, whose overall computational cost is less than a single DDPM or EDM step, while achieving an FID score of 6.86. This surpasses current state-of-the-art efficient diffusion models that use a single denoising step for sampling. Additionally, UDPM offers an interpretable and interpolable latent space, which gives it an advantage over traditional DDPMs. Our code is available online: \url{https://github.com/shadyabh/UDPM/}
Paper Structure (20 sections, 1 theorem, 41 equations, 12 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 1 theorem, 41 equations, 12 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

Let $\mathbf{e} \stackrel{iid}{\sim} \mathcal{N}(0, \mathbf{I}) \in \mathbb{R}^N$ and $\mathcal{H} = \mathcal{S}_\gamma \mathcal{W}$, where $\mathcal{S}_\gamma$ is a subsampling operator with stride $\gamma$ and $\mathcal{W}$ is a blur operator with blur kernel $\mathbf{w}$. Then, if the support of

Figures (12)

  • Figure 1: The Upsampling Diffusion Probabilistic Model (UDPM) scheme for 3 diffusion steps ($L=3$). In addition to the gradual noise perturbation in traditional DDPMs, UDPM also downsamples the latent variables. Accordingly, in the reverse process, UDPM denoises and upsamples the latent variables to generate images from the data distribution.
  • Figure 2: Generated $64\times 64$ images of AFHQv2 choi2020stargan with FID=7.10142, produced using unconditional UDPM with only 3 steps, which are equivalent to 0.3 of a single typical $64\times 64$ diffusion step.
  • Figure 3: Generated $64\times 64$ images of FFHQ with FID=7.41065, produced using unconditional UDPM with only 3 steps, which are equivalent to 0.3 of a single typical $64\times64$ diffusion step.
  • Figure 4: The training and sampling procedures of UDPM. During the training phase, an image $\mathbf{x}_0$ is randomly selected from the dataset $\{\mathbf{x}_0\} \sim q(\mathbf{x}_0)$. It is then degraded using \ref{['eq:xt_given_x0']} to obtain a downsampled noisy version $\mathbf{x}_l$, which is then plugged into a deep neural network $f_\theta^{(l)}(\cdot)$. The network is trained to predict $\mathcal{H}^{l-1}\mathbf{x}_0$. In the sampling phase, a pure white Gaussian noise $\mathbf{x}_L \sim \mathcal{N}(0,\mathbf{I})$ is generated. This noise is passed through the network $f_\theta^{(L)}(\cdot)$ to estimate $\mathcal{H}^{L-1}\mathbf{x}_0$. The estimated $\mathcal{H}^{L-1}\mathbf{x}_0$ is used to compute $\mu_L$ through \ref{['eq:mu_t']}, with $\Sigma_L$ obtained from \ref{['eq:sigma_t']}. Afterwards, $\mathbf{x}_{L-1}$ is drawn from $\mathcal{N}(\mu_L, \Sigma_L)$ using the technique described in Appendix \ref{['appndx:sampling_posterior']}. By repeating this procedure for $L$ iterations, the final sample $\mathbf{x}_0$ is obtained.
  • Figure 5: Latent space interpolation for $64\times 64$ generated images. The four corner images are interpolated by a weighted mixture of their latent noises, such that the other images are "in-between" images from the latent perspective, similar to what has been done in GANs karras2019style.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof