Denoising Diffusion Variational Inference: Diffusion Models as Expressive Variational Posteriors
Wasu Top Piriyakulkij, Yingheng Wang, Volodymyr Kuleshov
TL;DR
DDVI introduces denoising diffusion variational inference, employing diffusion-based posteriors in latent space to form expressive variational distributions q_phi(z|x). It derives a Markovian ELBO augmented with wake-sleep–style regularization, enabling stable, off-policy diffusion training that improves alignment with the true posterior p_theta(z|x). The method supports extensions to semi-supervised learning and clustering and demonstrates strong performance on MNIST, CIFAR-10, and the 1000 Genomes dataset, outperforming normalizing flows and adversarial approaches. By leveraging a diffusion trajectory with latent y variables, DDVI achieves tighter bounds and richer latent representations, with practical benefits for probabilistic programming, dimensionality reduction, and biology-inspired inference tasks.
Abstract
We propose denoising diffusion variational inference (DDVI), a black-box variational inference algorithm for latent variable models which relies on diffusion models as flexible approximate posteriors. Specifically, our method introduces an expressive class of diffusion-based variational posteriors that perform iterative refinement in latent space; we train these posteriors with a novel regularized evidence lower bound (ELBO) on the marginal likelihood inspired by the wake-sleep algorithm. Our method is easy to implement (it fits a regularized extension of the ELBO), is compatible with black-box variational inference, and outperforms alternative classes of approximate posteriors based on normalizing flows or adversarial networks. We find that DDVI improves inference and learning in deep latent variable models across common benchmarks as well as on a motivating task in biology -- inferring latent ancestry from human genomes -- where it outperforms strong baselines on the Thousand Genomes dataset.
