When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?
Yuansi Chen, Khashayar Gatmiry, Minhui Jiang
TL;DR
This work analyzes Metropolized Hamiltonian Monte Carlo (HMC) with leapfrog integration for sampling from smooth densities on $\mathbb{R}^d$ under Cheeger-type isoperimetric conditions and Lipschitz Hessian in Frobenius norm, establishing gradient complexity bounds of $\tilde{O}(d^{1/4}\,\text{polylog}(1/\varepsilon))$ from a warm start. A key novelty is proving that the joint distribution of the discretized location-velocity pair remains approximately invariant across leapfrog steps, which, via induction, yields sharp control over acceptance rates and transition overlaps. The main theorem provides a mixing-time bound $\tau_{mix}^\text{HMC}(\varepsilon) = O\left( \frac{1}{K^2\eta^2\psi_\mu^2} \log\left(\frac{M}{\varepsilon}\right) \right)$ and a gradient complexity $O\left( \frac{1}{K\eta^2\psi_\mu^2} \log\left(\frac{M}{\varepsilon}\right) \right)$ under $L$-smoothness, $\gamma L^{3/2}$-strong Hessian Lipschitz, and isoperimetric coefficient $\psi_\mu$. With optimal choices $K \asymp d^{1/4}$ and $\eta \asymp L^{-1}d^{-1/4}$ (for constant $\gamma$), HMC achieves the $d^{1/4}$-dimension scaling in mixing time (and near-identical scaling in gradient complexity), improving upon MALA's $d^{3/7}$ in the same regime. The paper also provides practical examples—ridge-separable functions and two-layer neural networks—that satisfy the assumptions and illustrate the regimes where $K>1$ yields tangible benefits.
Abstract
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach $ε$ error in total variation distance from a warm start by $\tilde O(d^{1/4}\text{polylog}(1/ε))$ and demonstrate the benefit of choosing the number of leapfrog steps to be larger than 1. To surpass the previous analysis on Metropolis-adjusted Langevin algorithm (MALA) that has $\tilde{O}(d^{1/2}\text{polylog}(1/ε))$ dimension dependency [WSC22], we reveal a key feature in our proof that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant. This key feature, when shown via induction over the number of leapfrog steps, enables us to obtain estimates on moments of various quantities that appear in the acceptance rate control of Metropolized HMC. Notably, our analysis does not require log-concavity or independence of the marginals, and only relies on an isoperimetric inequality. To illustrate the relevance of the Lipschitz Hessian in Frobenius norm assumption, several examples that fall into our framework are discussed.
