Table of Contents
Fetching ...

Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

Charles Miranda, Janina Schütte, David Sommer, Martin Eigel

TL;DR

This work studies how to approximate Langevin Monte Carlo sampling with a ResNet-like neural network architecture that maps samples from a simple reference distribution to a target distribution defined by a potential V. It analyzes the approximation quality in Wasserstein-2 distance under sub-Gaussianity of intermediate LMC measures and two drift-approximation regimes: global linear error growth and local Lipschitz constraints, proving complexity bounds that avoid the curse of dimensionality in favorable settings. The main contributions include formalizing a neural surrogate for the LMC drift, establishing uniform bounds on variance proxies, and proving that the resulting neural network can achieve arbitrary accuracy in sampling for smooth, strongly convex targets, supported by experiments on Gaussian, Gaussian mixture, and Darcy-PDE posteriors. The results provide a principled route to fast, scalable surrogate sampling in high dimensions, with practical potential for Bayesian inverse problems and uncertainty quantification where expensive forward solves constrain traditional MCMC.

Abstract

We sample from a given target distribution by constructing a neural network which maps samples from a simple reference, e.g. the standard normal distribution, to samples from the target. To that end, we propose using a neural network architecture inspired by the Langevin Monte Carlo (LMC) algorithm. Based on LMC perturbation results, we show approximation rates of the proposed architecture for smooth, log-concave target distributions measured in the Wasserstein-$2$ distance. The analysis heavily relies on the notion of sub-Gaussianity of the intermediate measures of the perturbed LMC process. In particular, we derive bounds on the growth of the intermediate variance proxies under different assumptions on the perturbations. Moreover, we propose an architecture similar to deep residual neural networks and derive expressivity results for approximating the sample to target distribution map.

Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

TL;DR

This work studies how to approximate Langevin Monte Carlo sampling with a ResNet-like neural network architecture that maps samples from a simple reference distribution to a target distribution defined by a potential V. It analyzes the approximation quality in Wasserstein-2 distance under sub-Gaussianity of intermediate LMC measures and two drift-approximation regimes: global linear error growth and local Lipschitz constraints, proving complexity bounds that avoid the curse of dimensionality in favorable settings. The main contributions include formalizing a neural surrogate for the LMC drift, establishing uniform bounds on variance proxies, and proving that the resulting neural network can achieve arbitrary accuracy in sampling for smooth, strongly convex targets, supported by experiments on Gaussian, Gaussian mixture, and Darcy-PDE posteriors. The results provide a principled route to fast, scalable surrogate sampling in high dimensions, with practical potential for Bayesian inverse problems and uncertainty quantification where expensive forward solves constrain traditional MCMC.

Abstract

We sample from a given target distribution by constructing a neural network which maps samples from a simple reference, e.g. the standard normal distribution, to samples from the target. To that end, we propose using a neural network architecture inspired by the Langevin Monte Carlo (LMC) algorithm. Based on LMC perturbation results, we show approximation rates of the proposed architecture for smooth, log-concave target distributions measured in the Wasserstein- distance. The analysis heavily relies on the notion of sub-Gaussianity of the intermediate measures of the perturbed LMC process. In particular, we derive bounds on the growth of the intermediate variance proxies under different assumptions on the perturbations. Moreover, we propose an architecture similar to deep residual neural networks and derive expressivity results for approximating the sample to target distribution map.
Paper Structure (39 sections, 29 theorems, 158 equations, 6 figures)

This paper contains 39 sections, 29 theorems, 158 equations, 6 figures.

Key Result

Theorem 1.3

Assume that $V\colon \mathbb{R}^d\rightarrow\mathbb{R}$ is an $M$-Lipschitz, $m$-strongly convex potential as in assump:potential. Let $\mu_{0}$ be sub-Gaussian with variance proxy $\sigma_0^2 > 0$ and $Y_0\sim \mu_0$. Then, for $\varepsilon > 0$, $h\in(0,\frac{2}{m+M})$ and $K \in \mathbb{N}$, ther Furthermore, the complexity of $\Psi$ can be bounded as follows.

Figures (6)

  • Figure 1.1: Sketch of the ResNet-like architecture used in this work.
  • Figure 6.1: Visualization of the bounding of the neural network used in \ref{['thm: bounded network']} in one dimension. Here, $f(x)=-\sigma(-x+1)+1$ defines a bound from above by $1$ and $g(x) = \sigma( x+1) -1$ defines a bound from below by 1. Note that $g\circ f$ is the identity on $[-1,1]$ and bounded by $\pm 1$ on $\mathbb{R}$. The bounding of the network in \ref{['thm: bounded network']} corresponds to an application of similar functions in every dimension.
  • Figure 6.2: Sketch of the construction of the network from \ref{['prop:combined_assump']}. On the $\ell^1$-ball of radius $r$, $B^1_r(0)$, the network approximates $-\nabla V$. On $B_b^1(0)\setminus B_r^1(0)$, where $b > r$, the network (approximately) interpolates linearly between $-\nabla V$ and $x\mapsto mx$. On $\mathbb{R}^d\setminus B_b^1(0)$, the network is identical to $x\mapsto mx$. In this way, global approximation with linearly growing error is achieved. For the precise construction and the choice of $b$, we refer to the proof of \ref{['prop:combined_assump']}.
  • Figure 7.1: Comparison between LMC and NN model on a Gaussian target distribution.
  • Figure 7.2: Comparison between LMC and NN model for a Gaussian mixture target density.
  • ...and 1 more figures

Theorems & Definitions (58)

  • Theorem 1.3: Main convergence result
  • Remark 2.1: Balls and spheres
  • Definition 2.2: Wasserstein space
  • Theorem 2.3: Guarantees for the constant-step LMC Dalalyan2017UserfriendlyGF
  • Definition 2.4: Sub-Gaussian random variable
  • Definition 2.5: Sub-Gaussian random vector
  • Proposition 2.6: $\ell^p$-norm of a sub-Gaussian random vector is sub-Gaussian
  • Definition 2.7: Lyapunov function altschuler2022concentration
  • Lemma 2.8: Connection between sub-Gaussianity and Lyapunov functions
  • Definition 2.9: Neural network architectures
  • ...and 48 more