Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction
Vincent Pauline, Tobias Höppe, Kirill Neklyudov, Alexander Tong, Stefan Bauer, Andrea Dittadi
TL;DR
This work provides a unified, self-contained treatment of diffusion models across both continuous and discrete state spaces. It starts from a discrete-time forward/noising process, derives exact reverse dynamics conditioned on data, and then shows how, in the limit of infinitely many steps, these converge to continuous-time formulations (SDEs for continuous spaces and CTMCs for discrete spaces). The authors present a three-step recipe (define forward corruption, parameterize the reverse, maximize the ELBO), and then recast the theory in a general infinitesimal-generator framework that yields equivalent forward/reverse formulations and training objectives (denoising score matching in the continuous case and denoising score entropy in the discrete case). They also discuss latent-diffusion strategies, practical reverse‑process parameterizations, and extensions bridging continuous and discrete diffusion for discrete data. The generator perspective unifies the standard diffusion literature and provides a principled path to generalizations, including piecewise-deterministic and jump processes, with clear implications for scalable, versatile diffusion methods. The result is a compact, theory‑driven roadmap to modern diffusion methodology applicable to both real-valued and categorical data, and to diffusion in learned latent spaces.
Abstract
Although diffusion models now occupy a central place in generative modeling, introductory treatments commonly assume Euclidean data and seldom clarify their connection to discrete-state analogues. This article is a self-contained primer on diffusion over general state spaces, unifying continuous domains and discrete/categorical structures under one lens. We develop the discrete-time view (forward noising via Markov kernels and learned reverse dynamics) alongside its continuous-time limits -- stochastic differential equations (SDEs) in $\mathbb{R}^d$ and continuous-time Markov chains (CTMCs) on finite alphabets -- and derive the associated Fokker--Planck and master equations. A common variational treatment yields the ELBO that underpins standard training losses. We make explicit how forward corruption choices -- Gaussian processes in continuous spaces and structured categorical transition kernels (uniform, masking/absorbing and more) in discrete spaces -- shape reverse dynamics and the ELBO. The presentation is layered for three audiences: newcomers seeking a self-contained intuitive introduction; diffusion practitioners wanting a global theoretical synthesis; and continuous-diffusion experts looking for an analogy-first path into discrete diffusion. The result is a unified roadmap to modern diffusion methodology across continuous domains and discrete sequences, highlighting a compact set of reusable proofs, identities, and core theoretical principles.
