Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling
Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth
TL;DR
Neural Flow Diffusion Models (NFDM) replace fixed forward diffusion with a learnable forward transform, enabling a broad family of latent dynamics while preserving likelihood-based training. The approach defines $q_ ablavarphi(oldsymbol{z}_t|oldsymbol{x})$ via an invertible $F_ ablavarphi(oldsymbol{ u},t,oldsymbol{x})$, yielding a variational bound on $- ext{log}p_{ heta, ablavarphi}(oldsymbol{x})$ and allowing end-to-end optimization that improves likelihoods on CIFAR-10 and ImageNet benchmarks. The framework supports deterministic sampling, non-Gaussian forward distributions, and bridges between distributions, and introduces restrictions such as curvature penalties to learn trajectories with desired properties (e.g., straight lines). Empirically, NFDM achieves state-of-the-art likelihoods, demonstrates faster few-step generation with -OT, and learns interpretable generative dynamics, including distribution bridges on AFHQ, albeit with increased training complexity. Overall, NFDM offers a versatile platform for designing and optimizing flexible forward processes in diffusion-based generative modeling with potential broad applicability.
Abstract
Conventional diffusion models typically relies on a fixed forward process, which implicitly defines complex marginal distributions over latent variables. This can often complicate the reverse process' task in learning generative trajectories, and results in costly inference for diffusion models. To address these limitations, we introduce Neural Flow Diffusion Models (NFDM), a novel framework that enhances diffusion models by supporting a broader range of forward processes beyond the standard Gaussian. We also propose a novel parameterization technique for learning the forward process. Our framework provides an end-to-end, simulation-free optimization objective, effectively minimizing a variational upper bound on the negative log-likelihood. Experimental results demonstrate NFDM's strong performance, evidenced by state-of-the-art likelihood estimation. Furthermore, we investigate NFDM's capacity for learning generative dynamics with specific characteristics, such as deterministic straight lines trajectories, and demonstrate how the framework may be adopted for learning bridges between two distributions. The results underscores NFDM's versatility and its potential for a wide range of applications.
