A Note on the Convergence of Denoising Diffusion Probabilistic Models

Sokhna Diarra Mbacke; Omar Rivasplata

A Note on the Convergence of Denoising Diffusion Probabilistic Models

Sokhna Diarra Mbacke, Omar Rivasplata

TL;DR

A quantitative upper bound on the Wasserstein distance between the data-generating distribution and the distribution learned by a diffusion model is derived, which holds for arbitrary data-Generating distributions on bounded instance spaces, even those without a density w.r.t. the Lebesgue measure.

Abstract

Diffusion models are one of the most important families of deep generative models. In this note, we derive a quantitative upper bound on the Wasserstein distance between the data-generating distribution and the distribution learned by a diffusion model. Unlike previous works in this field, our result does not make assumptions on the learned score function. Moreover, our bound holds for arbitrary data-generating distributions on bounded instance spaces, even those without a density w.r.t. the Lebesgue measure, and the upper bound does not suffer from exponential dependencies. Our main result builds upon the recent work of Mbacke et al. (2023) and our proofs are elementary.

A Note on the Convergence of Denoising Diffusion Probabilistic Models

TL;DR

Abstract

Paper Structure (21 sections, 5 theorems, 37 equations, 3 figures)

This paper contains 21 sections, 5 theorems, 37 equations, 3 figures.

Introduction
Related Works
Our contributions
Preliminaries
Denoising Diffusion Models
The forward process.
The backward process.
Additional Definitions
Our Approach
Main Result
Theorem Statement
Proof of the main theorem
Special case using the forward process of denoising-diffusion
The prior-matching term.
Upper-bounds on the average distance between Gaussian vectors.
...and 6 more sections

Key Result

Theorem 3.1

Assume the instance space $\mathcal{X}$ has finite diameter $\Delta = \sup_{\mathbf{x}, \mathbf{x}' \in \mathcal{X}} \left\lVert \mathbf{x} - \mathbf{x}' \right\rVert < \infty$, and let $\lambda > 0$ and $\delta \in (0, 1)$ be real numbers. Using the definitions and assumptions of the previous secti Where $\bm{\epsilon}, \bm{\epsilon}' \sim \mathcal{N}\left( \mathbf{0}, \mathbf{I} \right)$ are sta

Figures (3)

Figure 1: Denoising diffusion model
Figure 2: The points represent $2000$ samples from the target data-generating distribution.
Figure 3: The points represent $2000$ samples from the trained diffusion model.

Theorems & Definitions (12)

Theorem 3.1
Remark 3.1
Lemma 3.2
Lemma 3.3
proof
Lemma 3.4
proof : Proof Idea
Lemma 3.5
proof
Remark 3.2
...and 2 more

A Note on the Convergence of Denoising Diffusion Probabilistic Models

TL;DR

Abstract

A Note on the Convergence of Denoising Diffusion Probabilistic Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (12)