Step-by-Step Diffusion: An Elementary Tutorial

Preetum Nakkiran; Arwen Bradley; Hattie Zhou; Madhu Advani

Step-by-Step Diffusion: An Elementary Tutorial

Preetum Nakkiran, Arwen Bradley, Hattie Zhou, Madhu Advani

TL;DR

This paper presents an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience, and tries to simplify the mathematical details as much as possible while retaining enough precision to derive correct algorithms.

Abstract

We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms.

Step-by-Step Diffusion: An Elementary Tutorial

TL;DR

Abstract

Paper Structure (47 sections, 4 theorems, 108 equations, 11 figures, 7 algorithms)

This paper contains 47 sections, 4 theorems, 108 equations, 11 figures, 7 algorithms.

Fundamentals of Diffusion
Gaussian Diffusion
Diffusions in the Abstract
Discretization
Stochastic Sampling: DDPM
Correctness of DDPM
Technical Details [Optional].
Algorithms
Variance Reduction: Predicting $x_0$
Diffusions as SDEs [Optional]
Deterministic Sampling: DDIM
Case 1: Single Point
Velocity Fields and Gases
Case 2: Two Points
Case 3: Arbitrary Distributions
...and 32 more sections

Key Result

Lemma 1

Let $p(x)$ be an arbitrary density over $\mathbb{R}$, with bounded 1st to 4th order derivatives. Consider the joint distribution $(x_0, x_1)$, where $x_0 \sim p$ and $x_1 \sim x_0 + \mathcal{N}(0, \sigma^2)$. Then, for any conditioning $z \in \mathbb{R}$, we have where

Figures (11)

Figure 1: Probability distributions defined by diffusion forward process on one-dimensional target distribution $p_0$.
Figure 2: Illustration of Fact 1. The prior distribution $p(x_{t-1})$, leftmost, defines a joint distribution $(x_{t-1}, x_t)$ where $p(x_t \mid x_{t-1}) = \mathcal{N}(0, \sigma^2)$. We plot the reverse conditional distributions $p(x_{t-1} \mid x_t)$ for a fixed conditioning $x_t$, and varying noise levels $\sigma$. Notice these distributions become close to Gaussian for small $\sigma$.
Figure 3: The intuition behind Claim \ref{['claim:var_red']}. Given $x_t$, the final noise step $\eta_{t-{\Delta{t}}}$ is distributed identically as all other noise steps, intuitively because we only know the sum $x_t = x_0 + \sum_i \eta_i$.
Figure 4: Velocity field $v_t$ when $p_0 = \delta_{x_0}$, overlaid on the Gaussian distribution $p_t$.
Figure 5: Illustration of combining the velocity fields of two gasses. Left: The density and velocity fields of two independent gases (in red and blue). Right: The effective density and velocity field of the combined gas, including streamlines.
...and 6 more figures

Theorems & Definitions (16)

Definition 1: Reverse Sampler
Claim 1: Informal
proof : Proof of Claim \ref{['claim:ddpm_main']} (Informal)
Lemma 1
Claim 2
Claim 3
proof
Claim 4: DDIM as Linear Flow; Informal
Lemma 2
proof
...and 6 more

Step-by-Step Diffusion: An Elementary Tutorial

TL;DR

Abstract

Step-by-Step Diffusion: An Elementary Tutorial

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (16)