Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

Charles Miranda; Janina Schütte; David Sommer; Martin Eigel

Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

Charles Miranda, Janina Schütte, David Sommer, Martin Eigel

TL;DR

This work studies how to approximate Langevin Monte Carlo sampling with a ResNet-like neural network architecture that maps samples from a simple reference distribution to a target distribution defined by a potential V. It analyzes the approximation quality in Wasserstein-2 distance under sub-Gaussianity of intermediate LMC measures and two drift-approximation regimes: global linear error growth and local Lipschitz constraints, proving complexity bounds that avoid the curse of dimensionality in favorable settings. The main contributions include formalizing a neural surrogate for the LMC drift, establishing uniform bounds on variance proxies, and proving that the resulting neural network can achieve arbitrary accuracy in sampling for smooth, strongly convex targets, supported by experiments on Gaussian, Gaussian mixture, and Darcy-PDE posteriors. The results provide a principled route to fast, scalable surrogate sampling in high dimensions, with practical potential for Bayesian inverse problems and uncertainty quantification where expensive forward solves constrain traditional MCMC.

Abstract

We sample from a given target distribution by constructing a neural network which maps samples from a simple reference, e.g. the standard normal distribution, to samples from the target. To that end, we propose using a neural network architecture inspired by the Langevin Monte Carlo (LMC) algorithm. Based on LMC perturbation results, we show approximation rates of the proposed architecture for smooth, log-concave target distributions measured in the Wasserstein-$2$ distance. The analysis heavily relies on the notion of sub-Gaussianity of the intermediate measures of the perturbed LMC process. In particular, we derive bounds on the growth of the intermediate variance proxies under different assumptions on the perturbations. Moreover, we propose an architecture similar to deep residual neural networks and derive expressivity results for approximating the sample to target distribution map.

Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

TL;DR

Abstract

distance. The analysis heavily relies on the notion of sub-Gaussianity of the intermediate measures of the perturbed LMC process. In particular, we derive bounds on the growth of the intermediate variance proxies under different assumptions on the perturbations. Moreover, we propose an architecture similar to deep residual neural networks and derive expressivity results for approximating the sample to target distribution map.

Paper Structure (39 sections, 29 theorems, 158 equations, 6 figures)

This paper contains 39 sections, 29 theorems, 158 equations, 6 figures.

Introduction and scope
Related work
Deep Neural Networks
Sampling and Langevin Monte Carlo
Methodology
Contribution
Main result
Structure of the paper
Definitions and notation
Langevin Monte Carlo and Wasserstein space
Sub-Gaussianity
Lyapunov functions
Neural networks
ResNet-like architectures
Perturbed Langevin Monte Carlo
...and 24 more sections

Key Result

Theorem 1.3

Assume that $V\colon \mathbb{R}^d\rightarrow\mathbb{R}$ is an $M$-Lipschitz, $m$-strongly convex potential as in assump:potential. Let $\mu_{0}$ be sub-Gaussian with variance proxy $\sigma_0^2 > 0$ and $Y_0\sim \mu_0$. Then, for $\varepsilon > 0$, $h\in(0,\frac{2}{m+M})$ and $K \in \mathbb{N}$, ther Furthermore, the complexity of $\Psi$ can be bounded as follows.

Figures (6)

Figure 1.1: Sketch of the ResNet-like architecture used in this work.
Figure 6.1: Visualization of the bounding of the neural network used in \ref{['thm: bounded network']} in one dimension. Here, $f(x)=-\sigma(-x+1)+1$ defines a bound from above by $1$ and $g(x) = \sigma( x+1) -1$ defines a bound from below by 1. Note that $g\circ f$ is the identity on $[-1,1]$ and bounded by $\pm 1$ on $\mathbb{R}$. The bounding of the network in \ref{['thm: bounded network']} corresponds to an application of similar functions in every dimension.
Figure 6.2: Sketch of the construction of the network from \ref{['prop:combined_assump']}. On the $\ell^1$-ball of radius $r$, $B^1_r(0)$, the network approximates $-\nabla V$. On $B_b^1(0)\setminus B_r^1(0)$, where $b > r$, the network (approximately) interpolates linearly between $-\nabla V$ and $x\mapsto mx$. On $\mathbb{R}^d\setminus B_b^1(0)$, the network is identical to $x\mapsto mx$. In this way, global approximation with linearly growing error is achieved. For the precise construction and the choice of $b$, we refer to the proof of \ref{['prop:combined_assump']}.
Figure 7.1: Comparison between LMC and NN model on a Gaussian target distribution.
Figure 7.2: Comparison between LMC and NN model for a Gaussian mixture target density.
...and 1 more figures

Theorems & Definitions (58)

Theorem 1.3: Main convergence result
Remark 2.1: Balls and spheres
Definition 2.2: Wasserstein space
Theorem 2.3: Guarantees for the constant-step LMC Dalalyan2017UserfriendlyGF
Definition 2.4: Sub-Gaussian random variable
Definition 2.5: Sub-Gaussian random vector
Proposition 2.6: $\ell^p$-norm of a sub-Gaussian random vector is sub-Gaussian
Definition 2.7: Lyapunov function altschuler2022concentration
Lemma 2.8: Connection between sub-Gaussianity and Lyapunov functions
Definition 2.9: Neural network architectures
...and 48 more

Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

TL;DR

Abstract

Approximating Langevin Monte Carlo with ResNet-like Neural Network architectures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (58)