Diffusion Models with Deterministic Normalizing Flow Priors
Mohsen Zand, Ali Etemad, Michael Greenspan
TL;DR
This work introduces DiNof, a diffusion model variant that incorporates a data-dependent, deterministic prior learned through a normalizing flow to accelerate sampling and boost sample quality. By using a linear forward SDE up to a cutoff time $T_m$ and a nonlinear deterministic flow to map $x(T_m)$ to a latent $z$ with density $p_\theta(z)$, DiNof seeds the reverse diffusion from an informative prior and then completes the generation with remaining stochastic steps. Empirically, DiNof achieves state-of-the-art or competitive FID/IS scores on CIFAR-10 (FID 2.01, IS 9.96) and CelebA-HQ-256 (FID 7.11), while significantly reducing sampling time (e.g., 24s vs 43s) and allowing a tunable trade-off between determinism and randomness. The approach preserves compatibility with standard diffusion training and offers a practical, data-driven path to faster, higher-fidelity generative modeling with both unconditional and conditional applications in mind.
Abstract
For faster sampling and higher sample quality, we propose DiNof ($\textbf{Di}$ffusion with $\textbf{No}$rmalizing $\textbf{f}$low priors), a technique that makes use of normalizing flows and diffusion models. We use normalizing flows to parameterize the noisy data at any arbitrary step of the diffusion process and utilize it as the prior in the reverse diffusion process. More specifically, the forward noising process turns a data distribution into partially noisy data, which are subsequently transformed into a Gaussian distribution by a nonlinear process. The backward denoising procedure begins with a prior created by sampling from the Gaussian distribution and applying the invertible normalizing flow transformations deterministically. To generate the data distribution, the prior then undergoes the remaining diffusion stochastic denoising procedure. Through the reduction of the number of total diffusion steps, we are able to speed up both the forward and backward processes. More importantly, we improve the expressive power of diffusion models by employing both deterministic and stochastic mappings. Experiments on standard image generation datasets demonstrate the advantage of the proposed method over existing approaches. On the unconditional CIFAR10 dataset, for example, we achieve an FID of 2.01 and an Inception score of 9.96. Our method also demonstrates competitive performance on CelebA-HQ-256 dataset as it obtains an FID score of 7.11. Code is available at https://github.com/MohsenZand/DiNof.
