Multilevel-Langevin pathwise average for Gibbs approximation
Maxime Egéa, Fabien Panloup
TL;DR
The paper introduces a Multilevel-Langevin pathwise average method to approximate Gibbs distributions $\pi(dx) \propto e^{-U(x)}dx$ via overdamped Langevin diffusions. By combining occupation measures from Euler discretizations at geometrically decreasing steps, the authors achieve an $\varepsilon$-approximation with near $\varepsilon^{-2}$ complexity, up to logarithmic factors, and derive explicit $d$-dependent bounds in strongly convex settings. Two main regimes are analyzed: (i) $a=1$, $\delta=1/2$, yielding $\varepsilon^{-2}$-type complexity with $\log$ factors, and (ii) $a=2$, $\delta=1$, achieving $\varepsilon^{-2}$-complexity without the extra logs under stronger smoothness. Theoretical results are complemented by numerical experiments on Ornstein–Uhlenbeck and logistic-type perturbations, comparisons with ULA/MALA, and investigations into nonconvex robustness. Overall, the work provides a practically appealing, dimension-aware multilevel MCMC framework for Gibbs sampling with provable complexity guarantees and explicit parameter guidance.
Abstract
We propose and study a new multilevel method for the numerical approximation of a Gibbs distribution $π$ on $\mathbb{R}^d$, based on (overdamped) Langevin diffusions. This method inspired by \cite{mainPPlangevin} and \cite{giles_szpruch_invariant} relies on a multilevel occupation measure, $i.e.$ on an appropriate combination of $R$ occupation measures of (constant-step) Euler schemes with respective steps $γ_r = γ_0 2^{-r}$, $r=0,\ldots,R$. We first state a quantitative result under general assumptions which guarantees an \textit{$\varepsilon$-approximation} (in a $L^2$-sense) with a cost of the order $\varepsilon^{-2}$ or $\varepsilon^{-2}|\log \varepsilon|^3$ under less contractive assumptions. We then apply it to overdamped Langevin diffusions with strongly convex potential $U:\mathbb{R}^d\rightarrow\mathbb{R}$ and obtain an \textit{$\varepsilon$-complexity} of the order ${\cal O}(d\varepsilon^{-2}\log^3(d\varepsilon^{-2}))$ or ${\cal O}(d\varepsilon^{-2})$ under additional assumptions on $U$. More precisely, up to universal constants, an appropriate choice of the parameters leads to a cost controlled by ${(\barλ_U\vee 1)^2}{\underlineλ_U^{-3}} d\varepsilon^{-2}$ (where $\barλ_U$ and $\underlineλ_U$ respectively denote the supremum and the infimum of the largest and lowest eigenvalue of $D^2U$). We finally complete these theoretical results with some numerical illustrations including comparisons to other algorithms in Bayesian learning and opening to non strongly convex setting.
