An overview of diffusion models for generative artificial intelligence
Davide Gallon, Arnulf Jentzen, Philippe von Wurstemberger
TL;DR
The article provides a rigorous mathematical treatment of denoising diffusion probabilistic models (DDPMs) for generative AI, framing the problem with a forward diffusion $X^{\varnothing}$ that progressively adds noise and a learnable backward denoiser $X^{\theta}$ that reconstructs data from noise. It derives Gaussian-forward/backward dynamics, Bayes rules for Gaussian transitions, and a tractable training objective based on the cross-entropy $\negloglike{\mathfrak{p}^{\varnothing}_{0}}{\mathfrak{p}^{\theta}_{0}}$, with an upper bound that decomposes into per-step terms to guide learning. The paper then surveys a suite of advanced variants—Improved DDPM, DDIM, classifier-free diffusion guidance, and latent diffusion models such as Stable Diffusion—highlighting improvements in fidelity, controllability (including text and class conditioning), and sampling efficiency. By detailing architectures like UNets with time embeddings, evaluation metrics (e.g., Inception Score and Fréchet Inception Distance), and leading models such as GLIDE, DALL-E 2/3, and Imagen, the work provides a cohesive roadmap for deploying diffusion-based generative systems across vision and multimodal tasks.
Abstract
This article provides a mathematically rigorous introduction to denoising diffusion probabilistic models (DDPMs), sometimes also referred to as diffusion probabilistic models or diffusion models, for generative artificial intelligence. We provide a detailed basic mathematical framework for DDPMs and explain the main ideas behind training and generation procedures. In this overview article we also review selected extensions and improvements of the basic framework from the literature such as improved DDPMs, denoising diffusion implicit models, classifier-free diffusion guidance models, and latent diffusion models.
