Table of Contents
Fetching ...

Step-by-Step Diffusion: An Elementary Tutorial

Preetum Nakkiran, Arwen Bradley, Hattie Zhou, Madhu Advani

TL;DR

This paper presents an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience, and tries to simplify the mathematical details as much as possible while retaining enough precision to derive correct algorithms.

Abstract

We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms.

Step-by-Step Diffusion: An Elementary Tutorial

TL;DR

This paper presents an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience, and tries to simplify the mathematical details as much as possible while retaining enough precision to derive correct algorithms.

Abstract

We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms.
Paper Structure (47 sections, 4 theorems, 108 equations, 11 figures, 7 algorithms)

This paper contains 47 sections, 4 theorems, 108 equations, 11 figures, 7 algorithms.

Key Result

Lemma 1

Let $p(x)$ be an arbitrary density over $\mathbb{R}$, with bounded 1st to 4th order derivatives. Consider the joint distribution $(x_0, x_1)$, where $x_0 \sim p$ and $x_1 \sim x_0 + \mathcal{N}(0, \sigma^2)$. Then, for any conditioning $z \in \mathbb{R}$, we have where

Figures (11)

  • Figure 1: Probability distributions defined by diffusion forward process on one-dimensional target distribution $p_0$.
  • Figure 2: Illustration of Fact 1. The prior distribution $p(x_{t-1})$, leftmost, defines a joint distribution $(x_{t-1}, x_t)$ where $p(x_t \mid x_{t-1}) = \mathcal{N}(0, \sigma^2)$. We plot the reverse conditional distributions $p(x_{t-1} \mid x_t)$ for a fixed conditioning $x_t$, and varying noise levels $\sigma$. Notice these distributions become close to Gaussian for small $\sigma$.
  • Figure 3: The intuition behind Claim \ref{['claim:var_red']}. Given $x_t$, the final noise step $\eta_{t-{\Delta{t}}}$ is distributed identically as all other noise steps, intuitively because we only know the sum $x_t = x_0 + \sum_i \eta_i$.
  • Figure 4: Velocity field $v_t$ when $p_0 = \delta_{x_0}$, overlaid on the Gaussian distribution $p_t$.
  • Figure 5: Illustration of combining the velocity fields of two gasses. Left: The density and velocity fields of two independent gases (in red and blue). Right: The effective density and velocity field of the combined gas, including streamlines.
  • ...and 6 more figures

Theorems & Definitions (16)

  • Definition 1: Reverse Sampler
  • Claim 1: Informal
  • proof : Proof of Claim \ref{['claim:ddpm_main']} (Informal)
  • Lemma 1
  • Claim 2
  • Claim 3
  • proof
  • Claim 4: DDIM as Linear Flow; Informal
  • Lemma 2
  • proof
  • ...and 6 more