From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training
Julius Berner, Lorenz Richter, Marcin Sendera, Jarrid Rector-Brooks, Nikolay Malkin
TL;DR
This work tackles sampling from the unnormalized Boltzmann distribution $p_{ m target}(x)=\exp(-\mathcal{E}(x))/Z$ when $Z$ and samples are unavailable, using neural diffusion samplers. It builds a bridge between discrete-time RL objectives and continuous-time diffusion/PDE formulations by leveraging path-space measures, Radon-Nikodym derivatives, and Nelson's identity, and shows how global trajectory objectives converge to continuous-time divergences while local detailed-balance constraints converge to the Fokker-Planck equation. The main contributions include (i) establishing asymptotic equivalences between discrete-time and continuous-time objectives, (ii) deriving PDE-like constraints that arise from local reversibility, and (iii) demonstrating that training with coarse, nonuniform time discretizations can achieve competitive performance with substantially lower computation on standard benchmarks. Practically, these findings enable faster, more scalable diffusion samplers for high-dimensional Boltzmann targets without requiring samples from $p_{ m target}$.
Abstract
We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.
