Demystifying Variational Diffusion Models
Fabio De Sousa Ribeiro, Ben Glocker
TL;DR
Demystifying Variational Diffusion Models presents a cohesive, graph-based, variational treatment that places diffusion models within the top-down HLVM framework. It shows how forward Gaussian diffusion, a fixed top-down posterior, and a shared denoising/generative network yield a tractable ELBO that diffusion losses approximate as a weighted integral over noise levels; diffusion can be viewed as an infinitely deep HLVM in the continuous-time limit. The work clarifies multiple diffusion-objective parameterizations (image denoising, noise, score, energy, velocity, flow) and proves their equivalence through linear relationships, while detailing invariances to the forward-noise schedule and practical estimation techniques. It further discusses practical choices (weighting, importance sampling, data augmentation) and provides guidance for future work on representation learning, broader forward processes, and causal interpretations, highlighting diffusion models’ balance between ML objectives and perceptual quality.
Abstract
Despite the growing interest in diffusion models, gaining a deep understanding of the model class remains an elusive endeavour, particularly for the uninitiated in non-equilibrium statistical physics. Thanks to the rapid rate of progress in the field, most existing work on diffusion models focuses on either applications or theoretical contributions. Unfortunately, the theoretical material is often inaccessible to practitioners and new researchers, leading to a risk of superficial understanding in ongoing research. Given that diffusion models are now an indispensable tool, a clear and consolidating perspective on the model class is needed to properly contextualize recent advances in generative modelling and lower the barrier to entry for new researchers. To that end, we revisit predecessors to diffusion models like hierarchical latent variable models and synthesize a holistic perspective using only directed graphical modelling and variational inference principles. The resulting narrative is easier to follow as it imposes fewer prerequisites on the average reader relative to the view from non-equilibrium thermodynamics or stochastic differential equations.
