Diffusion Models with Deterministic Normalizing Flow Priors

Mohsen Zand; Ali Etemad; Michael Greenspan

Diffusion Models with Deterministic Normalizing Flow Priors

Mohsen Zand, Ali Etemad, Michael Greenspan

TL;DR

This work introduces DiNof, a diffusion model variant that incorporates a data-dependent, deterministic prior learned through a normalizing flow to accelerate sampling and boost sample quality. By using a linear forward SDE up to a cutoff time $T_m$ and a nonlinear deterministic flow to map $x(T_m)$ to a latent $z$ with density $p_\theta(z)$, DiNof seeds the reverse diffusion from an informative prior and then completes the generation with remaining stochastic steps. Empirically, DiNof achieves state-of-the-art or competitive FID/IS scores on CIFAR-10 (FID 2.01, IS 9.96) and CelebA-HQ-256 (FID 7.11), while significantly reducing sampling time (e.g., 24s vs 43s) and allowing a tunable trade-off between determinism and randomness. The approach preserves compatibility with standard diffusion training and offers a practical, data-driven path to faster, higher-fidelity generative modeling with both unconditional and conditional applications in mind.

Abstract

For faster sampling and higher sample quality, we propose DiNof ($\textbf{Di}$ffusion with $\textbf{No}$rmalizing $\textbf{f}$low priors), a technique that makes use of normalizing flows and diffusion models. We use normalizing flows to parameterize the noisy data at any arbitrary step of the diffusion process and utilize it as the prior in the reverse diffusion process. More specifically, the forward noising process turns a data distribution into partially noisy data, which are subsequently transformed into a Gaussian distribution by a nonlinear process. The backward denoising procedure begins with a prior created by sampling from the Gaussian distribution and applying the invertible normalizing flow transformations deterministically. To generate the data distribution, the prior then undergoes the remaining diffusion stochastic denoising procedure. Through the reduction of the number of total diffusion steps, we are able to speed up both the forward and backward processes. More importantly, we improve the expressive power of diffusion models by employing both deterministic and stochastic mappings. Experiments on standard image generation datasets demonstrate the advantage of the proposed method over existing approaches. On the unconditional CIFAR10 dataset, for example, we achieve an FID of 2.01 and an Inception score of 9.96. Our method also demonstrates competitive performance on CelebA-HQ-256 dataset as it obtains an FID score of 7.11. Code is available at https://github.com/MohsenZand/DiNof.

Diffusion Models with Deterministic Normalizing Flow Priors

TL;DR

and a nonlinear deterministic flow to map

to a latent

with density

, DiNof seeds the reverse diffusion from an informative prior and then completes the generation with remaining stochastic steps. Empirically, DiNof achieves state-of-the-art or competitive FID/IS scores on CIFAR-10 (FID 2.01, IS 9.96) and CelebA-HQ-256 (FID 7.11), while significantly reducing sampling time (e.g., 24s vs 43s) and allowing a tunable trade-off between determinism and randomness. The approach preserves compatibility with standard diffusion training and offers a practical, data-driven path to faster, higher-fidelity generative modeling with both unconditional and conditional applications in mind.

Abstract

For faster sampling and higher sample quality, we propose DiNof (

ffusion with

rmalizing

low priors), a technique that makes use of normalizing flows and diffusion models. We use normalizing flows to parameterize the noisy data at any arbitrary step of the diffusion process and utilize it as the prior in the reverse diffusion process. More specifically, the forward noising process turns a data distribution into partially noisy data, which are subsequently transformed into a Gaussian distribution by a nonlinear process. The backward denoising procedure begins with a prior created by sampling from the Gaussian distribution and applying the invertible normalizing flow transformations deterministically. To generate the data distribution, the prior then undergoes the remaining diffusion stochastic denoising procedure. Through the reduction of the number of total diffusion steps, we are able to speed up both the forward and backward processes. More importantly, we improve the expressive power of diffusion models by employing both deterministic and stochastic mappings. Experiments on standard image generation datasets demonstrate the advantage of the proposed method over existing approaches. On the unconditional CIFAR10 dataset, for example, we achieve an FID of 2.01 and an Inception score of 9.96. Our method also demonstrates competitive performance on CelebA-HQ-256 dataset as it obtains an FID score of 7.11. Code is available at https://github.com/MohsenZand/DiNof.

Paper Structure (16 sections, 8 equations, 7 figures, 4 tables)

This paper contains 16 sections, 8 equations, 7 figures, 4 tables.

Introduction
Related Work
Method
Background
Diffusion Models
Normalizing Flows
Proposed Method
Experiments
Protocols and Datasets
Results
Model Parameters
Unconditional Color Image Generation
Sampling Time
Intermediate Results
Qualitative Results
...and 1 more sections

Figures (7)

Figure 1: Uncurated samples generated by DiNof on CelebA-HQ-256 (left) and CIFAR-10 (middle) datasets. On the right figure, sample quality in terms of FID is shown versus diffusion/sampling steps for different diffusion-based generative models. As opposed to several other methods, DiNof can speed up the process while improving the sample quality.
Figure 2: Architectural comparison of Diffusion, Normalizing Flows, DiffFlow zhang2021diffusion, and the proposed DiNof models. DiffFlow uses stochastic and trainable processes for both the forward and the backward processes, whereas DiNof utilizes a deterministic trainable process only at the final steps of the forward process. The backward process initiates with a deterministic process and turns to a stochastic process to generates images. We use $T_m$ to denote an arbitrary intermediate latent variable between data space and latent space.
Figure 3: An overview of DiNof. It employs both linear stochastic and nonlinear deterministic trajectories in the mapping between data space and latent space using SDEs and ODEs. It hence utilizes normalizing flows to nonlinearize the diffusion models. Glow kingma2018glow architecture is used as the normalizing flow model.
Figure 4: CIFAR-10 samples with different $T_m$ thresholds.
Figure 5: Visual samples for various iteration numbers during training on the CelebA-HQ-256 dataset.
...and 2 more figures

Diffusion Models with Deterministic Normalizing Flow Priors

TL;DR

Abstract

Diffusion Models with Deterministic Normalizing Flow Priors

Authors

TL;DR

Abstract

Table of Contents

Figures (7)