Table of Contents
Fetching ...

Score-based deterministic density sampling

Vasily Ilin, Peter Sushko, Jingwei Hu

TL;DR

The paper tackles sampling from an unnormalized density $\pi$ when only the score $\nabla \log \pi$ is available, proposing Score-Based Transport Modeling (SBTM) as a deterministic counterpart to diffusion-based methods. SBTM couples a particle system with a time-varying score network $s^{\Theta_t}$ learned via score matching to approximate $\nabla \log f_t$, evolving particles with $\dot X_t=\nabla \log \pi(X_t)-s^{\Theta_t}(X_t)$ and updating $\Theta_t$ to minimize $L(s^{\Theta_t}, f_t)$. The authors prove entropy-dissipation guarantees for the coupled dynamics, show how small score-matching loss yields near-optimal convergence rates under a log-Sobolev condition, and extend results to annealed dynamics. Empirically, SBTM exhibits smooth trajectories, optimal or near-optimal convergence rates, and strong sample efficiency across low-dimensional, multimodal, and high-dimensional targets, including MNIST in $784$ dimensions, while scaling linearly with problem size. This approach provides a practical, deterministic alternative to Langevin dynamics with interpretable convergence diagnostics and effective high-dimensional performance.

Abstract

We propose a deterministic sampling framework using Score-Based Transport Modeling for sampling an unnormalized target density $π$ given only its score $\nabla \log π$. Our method approximates the Wasserstein gradient flow on $\mathrm{KL}(f_t\|π)$ by learning the time-varying score $\nabla \log f_t$ on the fly using score matching. While having the same marginal distribution as Langevin dynamics, our method produces smooth deterministic trajectories, resulting in monotone noise-free convergence. We prove that our method dissipates relative entropy at the same rate as the exact gradient flow, provided sufficient training. Numerical experiments validate our theoretical findings: our method converges at the optimal rate, has smooth trajectories, and is often more sample efficient than its stochastic counterpart. Experiments on high-dimensional image data show that our method produces high-quality generations in as few as 15 steps and exhibits natural exploratory behavior. The memory and runtime scale linearly in the sample size.

Score-based deterministic density sampling

TL;DR

The paper tackles sampling from an unnormalized density when only the score is available, proposing Score-Based Transport Modeling (SBTM) as a deterministic counterpart to diffusion-based methods. SBTM couples a particle system with a time-varying score network learned via score matching to approximate , evolving particles with and updating to minimize . The authors prove entropy-dissipation guarantees for the coupled dynamics, show how small score-matching loss yields near-optimal convergence rates under a log-Sobolev condition, and extend results to annealed dynamics. Empirically, SBTM exhibits smooth trajectories, optimal or near-optimal convergence rates, and strong sample efficiency across low-dimensional, multimodal, and high-dimensional targets, including MNIST in dimensions, while scaling linearly with problem size. This approach provides a practical, deterministic alternative to Langevin dynamics with interpretable convergence diagnostics and effective high-dimensional performance.

Abstract

We propose a deterministic sampling framework using Score-Based Transport Modeling for sampling an unnormalized target density given only its score . Our method approximates the Wasserstein gradient flow on by learning the time-varying score on the fly using score matching. While having the same marginal distribution as Langevin dynamics, our method produces smooth deterministic trajectories, resulting in monotone noise-free convergence. We prove that our method dissipates relative entropy at the same rate as the exact gradient flow, provided sufficient training. Numerical experiments validate our theoretical findings: our method converges at the optimal rate, has smooth trajectories, and is often more sample efficient than its stochastic counterpart. Experiments on high-dimensional image data show that our method produces high-quality generations in as few as 15 steps and exhibits natural exploratory behavior. The memory and runtime scale linearly in the sample size.

Paper Structure

This paper contains 15 sections, 5 theorems, 36 equations, 13 figures.

Key Result

Theorem 3.1

If $f_t$ follows the Wasserstein GF on $\operatorname{KL}\!\left(\cdot\,\|\,\pi\right)$ then relative entropy dissipates at the rate of the relative Fisher information: Additionally, if $\pi$ satisfies the log-Sobolev inequality with constant $\alpha$ (e.g. if $\pi$ is $\alpha$-log-concave) then

Figures (13)

  • Figure 1: Left: Langevin dynamics (stochastic). Right: ours (deterministic). The deterministic algorithm has the same marginal distributions as the stochastic one but with smooth trajectories. In this plot both algorithms interpolate between the unit Gaussian and a mixture of two Gaussians.
  • Figure 2: Visualization of \ref{['eqn: GF ODE']}. Particles are pulled towards target $\pi$ and away from particle density $f_t$. Brownian motion acts randomly.
  • Figure 3: Experiment \ref{['exp: Log-concave target']}, log-concave target. Top: relative entropy dissipation rate of SBTM (ours) and SDE (stochastic). SBTM approximates entropy decay rate well, while SDE is noisy. Bottom left: relative entropy of SBTM, SDE and the ground truth. SBTM approximates the ground truth well. Bottom right: L2 error to the true ground truth solution. SBTM produces lower error with smoother trajectory.
  • Figure 4: KL divergence ($\downarrow$) between the sample and the target $\pi = \mathcal{N}(0,1)$, using time step $0.002$ and final time $2.5$. SBTM exhibits better sample efficiency, likely due to determinism.
  • Figure 5: Experiment \ref{['exp: Gaussian mixture 1D']}, 1D Gaussian mixture. Left: KL divergence of SBTM (ours) and SDE (stochastic) over time. SBTM exhibits smoother convergence. Right: entropy dissipation of SBTM and SDE. SBTM approximates entropy decay rate perfectly with the computable quantity $\operatorname{F}\!\left(f_t\,\|\,\pi\right)$, while SDE is noisy.
  • ...and 8 more figures

Theorems & Definitions (14)

  • Remark 2.1
  • Remark 2.2: Comparison with Langevin Dynamics
  • Remark 2.3
  • Theorem 3.1
  • Theorem 3.2: Small loss guarantees optimal entropy dissipation
  • Remark 3.3
  • proof : Proof of Theorem \ref{['thm: SBTM entropy dissipation']}
  • Remark 3.4
  • Theorem 3.5: Sufficient training guarantees bounded loss
  • proof : Proof of Theorem \ref{['thm: loss non-increasing']}
  • ...and 4 more