Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

Paul Hagemann; Sophie Mildenberger; Lars Ruthotto; Gabriele Steidl; Nicole Tianjiao Yang

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

Paul Hagemann, Sophie Mildenberger, Lars Ruthotto, Gabriele Steidl, Nicole Tianjiao Yang

TL;DR

This papers develops SBDMs in the infinite-dimensional setting, that is, they model the training data as functions supported on a rectangular domain and shows their well-posedness, derive adequate discretizations, and investigate the role of the latent distributions.

Abstract

Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of finite size. This paper develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. In addition to the quest for generating images at ever-higher resolutions, our primary motivation is to create a well-posed infinite-dimensional learning problem that we can discretize consistently on multiple resolution levels. We thereby intend to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process using trace class operators to ensure that the latent distribution is well-defined in the infinite-dimensional setting and derive the reverse processes for finite-dimensional approximations. Second, we illustrate that approximating the score function with an operator network is beneficial for multilevel training. After deriving the convergence of the discretization and the approximation of multilevel training, we demonstrate some practical benefits of our infinite-dimensional SBDM approach on a synthetic Gaussian mixture example, the MNIST dataset, and a dataset generated from a nonlinear 2D reaction-diffusion equation.

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

TL;DR

Abstract

Paper Structure (34 sections, 9 theorems, 113 equations, 10 figures, 1 table)

This paper contains 34 sections, 9 theorems, 113 equations, 10 figures, 1 table.

Introduction
Contributions and main results
Related work
Finite-dimensional Score-Based Diffusion Models
Finite-dimensional Stochastic differential equations
Motivation for trace class Gaussian measures
Score-Based Diffusion Models on Infinite-Dimensional Hilbert Spaces
Forward SDE and its Discretization
Reverse Process
Multilevel Learning Approaches
FNO parameterization of the score function
Prior distributions
Experiments
Gaussian Mixture
MNIST experiment: Super-resolution
...and 19 more sections

Key Result

Theorem 3.1

\newlabelvar0 Let $X_0 \in H$ be a random variable and let $W^Q$ be a $Q$-Wiener process that is independent of $X_0$. Then the SDE eq:SDE has a unique strong solution $(X_t)_{t \in [0,T]}$, which is given $\mathbb{P}$-almost surely for all $t \in [0,T]$ by If $\mathbb E \left[ \lVert X_0 \rVert^2\right]<\infty$, then it holds $\mathbb E \left[ \sup_{t \in [0,T]} \lVert X_t\rVert^2 \right]<\inft

Figures (10)

Figure 1: Approximation of the true score at different resolutions (red is resolution 32, blue is resolution 64). Note that the models are trained on the downsampled images, which are distinct from the resolution 32 images.
Figure 2: MNIST images generated by a U-Net trained with standard Gaussian prior at resolutions 32 at resolutions 32, 50, 64 (left to right).
Figure 3: MNIST images generated by FNO and different priors at resolutions 32, 50, and 64 and loss curves for the MNIST example: training resolution loss (red) and loss at resolution 64 (blue).
Figure 4: Sliced Wasserstein curves for the MNIST example.
Figure 5: DSM loss curve on the MNIST test set. Left: warm starting method versus the cold start in terms of epochs. Right: warm start vs cold start in terms of time. The black vertical line depicts the training at the finer resolution, i.e., from epoch 50 we train on $64 \times 64$.
...and 5 more figures

Theorems & Definitions (19)

Remark 2.1
Theorem 3.1: Variation of constants formula
Theorem 3.2: Convergence of forward process
Proof 1
Corollary 3.3
Theorem 3.4: Discretized reverse process
Proof 2
Corollary 3.5
Proof 3
Theorem 3.6
...and 9 more

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

TL;DR

Abstract

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (19)