Table of Contents
Fetching ...

Diffusion Density Estimators

Akhil Premkumar

TL;DR

The paper investigates diffusion models as neural density estimators for $p_d$ and highlights limitations of the standard Probability Flow ODE approach. It introduces a parallelizable path-integral Monte Carlo method to estimate log-densities without solving a flow, moving derivatives to a Gaussian transition term and enabling vectorized computation. Through experiments on Gaussian mixtures, it studies how training parameters ($N$, $n_t$, $n_{ep}$) and the choice of entropy matching vs score matching influence accuracy and efficiency, finding entropy matching to be faster and more memory-efficient. The work demonstrates a viable, scalable diffusion-based density estimator and highlights directions for applying to higher-dimensional data and inference tasks.

Abstract

We investigate the use of diffusion models as neural density estimators. The current approach to this problem involves converting the generative process to a smooth flow, known as the Probability Flow ODE. The log density at a given sample can be obtained by solving the ODE with a black-box solver. We introduce a new, highly parallelizable method that computes log densities without the need to solve a flow. Our approach is based on estimating a path integral by Monte Carlo, in a manner identical to the simulation-free training of diffusion models. We also study how different training parameters affect the accuracy of the density calculation, and offer insights into how these models can be made more scalable and efficient.

Diffusion Density Estimators

TL;DR

The paper investigates diffusion models as neural density estimators for and highlights limitations of the standard Probability Flow ODE approach. It introduces a parallelizable path-integral Monte Carlo method to estimate log-densities without solving a flow, moving derivatives to a Gaussian transition term and enabling vectorized computation. Through experiments on Gaussian mixtures, it studies how training parameters (, , ) and the choice of entropy matching vs score matching influence accuracy and efficiency, finding entropy matching to be faster and more memory-efficient. The work demonstrates a viable, scalable diffusion-based density estimator and highlights directions for applying to higher-dimensional data and inference tasks.

Abstract

We investigate the use of diffusion models as neural density estimators. The current approach to this problem involves converting the generative process to a smooth flow, known as the Probability Flow ODE. The log density at a given sample can be obtained by solving the ODE with a black-box solver. We introduce a new, highly parallelizable method that computes log densities without the need to solve a flow. Our approach is based on estimating a path integral by Monte Carlo, in a manner identical to the simulation-free training of diffusion models. We also study how different training parameters affect the accuracy of the density calculation, and offer insights into how these models can be made more scalable and efficient.

Paper Structure

This paper contains 16 sections, 33 equations, 7 figures.

Figures (7)

  • Figure 1: Log densities computed from diffusion models using the path integral approach (see \ref{['sec:DensityEstimation']}). ${{p}_{\rm d}}$ is the true data distribution, a mixture of six Gaussians in ${D}=9$ dimensions, and ${p}_{\rm DM}$ is the density computed from the diffusion model. All models were trained with parameters $N = 8192, n_{\rm t} = 10, n_{\rm ep} = 200$ (see \ref{['sec:Experiments']}).
  • Figure 2: A schematic of the forward and reverse diffusion processes.
  • Figure 3: The Monte Carlo estimate of the path integral, \ref{['eq:PathIntegralMC']}, propagates $y_{\rm d} \sim {{p}_{\rm d}}$ to several random instants of time by sampling the transition kernel ${p}(y_s, s|y_{\rm d},0)$. This allows us to reach the encircled points in one large jump (the dashed lines), thereby avoiding a full simulation of the stochastic trajectories. The accuracy of the MC estimate will improve if we use a larger number of such 'throws'. See \ref{['sec:FiniteNumberEffects']}.
  • Figure 4: The effect of training parameters on model performance.
  • Figure 5: Interplay between number of throws and training epochs.$N=8192$.
  • ...and 2 more figures