Table of Contents
Fetching ...

Scale Space Diffusion

Soumik Mukhopadhyay, Prateksha Udhayanan, Abhinav Shrivastava

TL;DR

Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network, is introduced to support Scale Space Diffusion and is evaluated on CelebA and ImageNet.

Abstract

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.

Scale Space Diffusion

TL;DR

Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network, is introduced to support Scale Space Diffusion and is evaluated on CelebA and ImageNet.

Abstract

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.
Paper Structure (31 sections, 3 theorems, 41 equations, 45 figures, 14 tables, 5 algorithms)

This paper contains 31 sections, 3 theorems, 41 equations, 45 figures, 14 tables, 5 algorithms.

Key Result

Theorem 1

Let a generalized linear diffusion process be defined by and suppose the marginal distribution satisfies Then the transition mean and covariance are given by

Figures (45)

  • Figure 1: (a) Our proposed Scale Space Diffusion fuses scale spaces into diffusion models. (b) We show trends in image generation performance versus time for our proposed Flexi-UNet for CelebA-64, CelebA-128, and CelebA-256. Multiple point on the same plot represent our models with different number levels (i.e., number of intermediate resolutions). We see immense gains in efficiency with resolution scaling while having reasonable performance.
  • Figure 2: Information Analysis. (a) Amount of information present in a diffusion state as diffusion step $t$ changes. (b) Amount of information present in images at various resolutions (scales).
  • Figure 3: Overview. Left: During training $x_t$'s at resolution $r(t)$ are sampled using Eq. \ref{['eq:make_noisy_ssd_matrix']}, and our model is trained to predict clean image $x_{0,\theta}^{r(t-1)}$ using the loss as in Eq. \ref{['eq:loss_ssd']}. Our Flexi-UNet is able to process both resolution-preserving and resolution-changing steps at multiple resolution using only parts of the network. Right-top: During sampling, Eq. \ref{['eq:reverse_step_ddpm_matrix_sigma_simplified_isotropic']} is used to progressively denoise and upsample to generate images. Right-bottom: Our Flexi-UNet has additional 1${\times}$1 Conv layers to take inputs at any UNet encoder block and get outputs form any decoder blocks. For resolution changing, the skip connections are fed with zero-filled tensors.
  • Figure 3: ImageNet-64 Results. Unconditional image generation results on ImageNet-64 dataset.
  • Figure 4: Visual samples. Top: ImageNet-64 unconditional generation. For the top-most sample we also show model prediction at various scales (8, 16, 32, 64) during SSD . Bottom: CelebA-256 unconditional generation. For the top-most sample we also show model predictions at various scales (8, 16, 32, 64, 128, 256).
  • ...and 40 more figures

Theorems & Definitions (6)

  • Theorem 1: Forward Transition
  • proof
  • Theorem 2: Posterior Distribution
  • proof
  • Theorem 3: Closed-Form Posterior Under Isotropic Marginals
  • proof