Neural Diffusion Models
Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth
TL;DR
This work addresses the rigidity of fixed forward processes in diffusion models by introducing Neural Diffusion Models (NDMs), which learn time-dependent nonlinear data transformations $F_\varphi(\mathbf{x}, t)$ to adapt the forward path. It provides a simulation-free variational objective and a continuous-time SDE/ODE formulation for the reverse process, enabling fast inference with standard solvers. Empirically, NDMs improve log-likelihood on CIFAR-10, downsampled ImageNet, and CelebA-HQ, while maintaining high-quality samples, and can learn simple dynamics such as dynamic optimal transport. The framework unifies and extends existing diffusion models, offering a density-estimation-friendly, flexible generative paradigm with broad applicability to compression, semi-supervised learning, and purification.
Abstract
Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
