Score-based deterministic density sampling
Vasily Ilin, Peter Sushko, Jingwei Hu
TL;DR
The paper tackles sampling from an unnormalized density $\pi$ when only the score $\nabla \log \pi$ is available, proposing Score-Based Transport Modeling (SBTM) as a deterministic counterpart to diffusion-based methods. SBTM couples a particle system with a time-varying score network $s^{\Theta_t}$ learned via score matching to approximate $\nabla \log f_t$, evolving particles with $\dot X_t=\nabla \log \pi(X_t)-s^{\Theta_t}(X_t)$ and updating $\Theta_t$ to minimize $L(s^{\Theta_t}, f_t)$. The authors prove entropy-dissipation guarantees for the coupled dynamics, show how small score-matching loss yields near-optimal convergence rates under a log-Sobolev condition, and extend results to annealed dynamics. Empirically, SBTM exhibits smooth trajectories, optimal or near-optimal convergence rates, and strong sample efficiency across low-dimensional, multimodal, and high-dimensional targets, including MNIST in $784$ dimensions, while scaling linearly with problem size. This approach provides a practical, deterministic alternative to Langevin dynamics with interpretable convergence diagnostics and effective high-dimensional performance.
Abstract
We propose a deterministic sampling framework using Score-Based Transport Modeling for sampling an unnormalized target density $π$ given only its score $\nabla \log π$. Our method approximates the Wasserstein gradient flow on $\mathrm{KL}(f_t\|π)$ by learning the time-varying score $\nabla \log f_t$ on the fly using score matching. While having the same marginal distribution as Langevin dynamics, our method produces smooth deterministic trajectories, resulting in monotone noise-free convergence. We prove that our method dissipates relative entropy at the same rate as the exact gradient flow, provided sufficient training. Numerical experiments validate our theoretical findings: our method converges at the optimal rate, has smooth trajectories, and is often more sample efficient than its stochastic counterpart. Experiments on high-dimensional image data show that our method produces high-quality generations in as few as 15 steps and exhibits natural exploratory behavior. The memory and runtime scale linearly in the sample size.
