The shifted ODE method for underdamped Langevin MCMC
James Foster, Terry Lyons, Harald Oberhauser
TL;DR
This work introduces the shifted ODE method as a high-order, derivative-free approximation to underdamped Langevin diffusion for MCMC sampling. By recasting the SDE as a controlled differential equation driven by a Brownian-bridge-based path, the authors derive two practical high-order discretizations, SORT and SOFA, which require limited gradient evaluations per step. They establish non-asymptotic 2-Wasserstein error bounds that improve with additional smoothness of the target density, and demonstrate favorable empirical performance on logistic regression tasks. The approach offers a pathway to faster, high-accuracy unadjusted Langevin MCMC by leveraging ODE solvers while maintaining tractable computational costs.
Abstract
In this paper, we consider the underdamped Langevin diffusion (ULD) and propose a numerical approximation using its associated ordinary differential equation (ODE). When used as a Markov Chain Monte Carlo (MCMC) algorithm, we show that the ODE approximation achieves a $2$-Wasserstein error of $\varepsilon$ in $\mathcal{O}\big(d^{\frac{1}{3}}/\varepsilon^{\frac{2}{3}}\big)$ steps under the standard smoothness and strong convexity assumptions on the target distribution. This matches the complexity of the randomized midpoint method proposed by Shen and Lee [NeurIPS 2019] which was shown to be order optimal by Cao, Lu and Wang. However, the main feature of the proposed numerical method is that it can utilize additional smoothness of the target log-density $f$. More concretely, we show that the ODE approximation achieves a $2$-Wasserstein error of $\varepsilon$ in $\mathcal{O}\big(d^{\frac{2}{5}}/\varepsilon^{\frac{2}{5}}\big)$ and $\mathcal{O}\big(\sqrt{d}/\varepsilon^{\frac{1}{3}}\big)$ steps when Lipschitz continuity is assumed for the Hessian and third derivative of $f$. By discretizing this ODE using a third order Runge-Kutta method, we can obtain a practical MCMC method that uses just two additional gradient evaluations per step. In our experiment, where the target comes from a logistic regression, this method shows faster convergence compared to other unadjusted Langevin MCMC algorithms.
