Table of Contents
Fetching ...

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

Julius Berner, Lorenz Richter, Marcin Sendera, Jarrid Rector-Brooks, Nikolay Malkin

TL;DR

This work tackles sampling from the unnormalized Boltzmann distribution $p_{ m target}(x)=\exp(-\mathcal{E}(x))/Z$ when $Z$ and samples are unavailable, using neural diffusion samplers. It builds a bridge between discrete-time RL objectives and continuous-time diffusion/PDE formulations by leveraging path-space measures, Radon-Nikodym derivatives, and Nelson's identity, and shows how global trajectory objectives converge to continuous-time divergences while local detailed-balance constraints converge to the Fokker-Planck equation. The main contributions include (i) establishing asymptotic equivalences between discrete-time and continuous-time objectives, (ii) deriving PDE-like constraints that arise from local reversibility, and (iii) demonstrating that training with coarse, nonuniform time discretizations can achieve competitive performance with substantially lower computation on standard benchmarks. Practically, these findings enable faster, more scalable diffusion samplers for high-dimensional Boltzmann targets without requiring samples from $p_{ m target}$.

Abstract

We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

TL;DR

This work tackles sampling from the unnormalized Boltzmann distribution when and samples are unavailable, using neural diffusion samplers. It builds a bridge between discrete-time RL objectives and continuous-time diffusion/PDE formulations by leveraging path-space measures, Radon-Nikodym derivatives, and Nelson's identity, and shows how global trajectory objectives converge to continuous-time divergences while local detailed-balance constraints converge to the Fokker-Planck equation. The main contributions include (i) establishing asymptotic equivalences between discrete-time and continuous-time objectives, (ii) deriving PDE-like constraints that arise from local reversibility, and (iii) demonstrating that training with coarse, nonuniform time discretizations can achieve competitive performance with substantially lower computation on standard benchmarks. Practically, these findings enable faster, more scalable diffusion samplers for high-dimensional Boltzmann targets without requiring samples from .

Abstract

We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.
Paper Structure (44 sections, 3 theorems, 47 equations, 12 figures, 3 tables)

This paper contains 44 sections, 3 theorems, 47 equations, 12 figures, 3 tables.

Key Result

Proposition 3.1

As $\max_{n=0}^{N-1} \Delta t_n\to0$, $\iota(\widehat{X})$ converges weakly and strongly to $X$ with order $\gamma=1$ and the path measures $\iota_*\widehat{\mathbbm{P}}$ converge weakly to $\mathbbm{P}$.

Figures (12)

  • Figure 1: The problem of making continuous-time forward and reverse processes determine the same path space measure is approximated by matching distributions over discrete-time trajectories.
  • Figure 2: Training objectives for neural SDEs (top row) and their approximations by objectives for discrete-time policies (bottom row). On-policy objectives minimize a divergence by differentiating through SDE integration, while off-policy objectives enforce local or global consistency constraints. Our results explain the behavior of discrete-time objectives as the time discretization becomes finer.
  • Figure 3: The MDP and policy representing the process $\widehat{\mathbbm{P}}$, a distribution over $\widehat{X}=(\widehat{X}_0,\dots,\widehat{X}_N)$.
  • Figure 4: Sampled 10-step discretizations of the unit interval using the three schemes considered.
  • Figure 6: Left: Time to train for 25k iterations on Manywell as a function of $N_{\rm train}$, mean and std over 3 runs (note the log-log scale). Right: Runtime and ELBO gap, showing that Random discretization gives a superior balance of speed and performance. The marker area is proportional to $N_{\rm train}$. Results for 25GMM and Funnel densities in \ref{['fig:timing_app']}.
  • ...and 7 more figures

Theorems & Definitions (7)

  • Proposition 3.1: Convergence of Euler-Maruyama scheme
  • Definition B.1: Strong convergence
  • Definition B.2: Weak convergence
  • Lemma B.3: Convergence of Radon-Nikodym derivatives
  • proof
  • Lemma B.4: Continuous-time asymptotics of the DB discrepancy
  • proof