Multi-fidelity No-U-Turn Sampling
Kislaya Ravi, Tobias Neckel, Hans-Joachim Bungartz
TL;DR
The paper addresses the high computational cost of gradient-based MCMC for expensive models by introducing MFNUTS, a framework that uses a multi-fidelity Gaussian Process surrogate to approximate derivatives and guide No-U-Turn sampling. It combines a non-linear multi-fidelity GP construction (including NARGP and derivative fusion variants) with a Delayed Acceptance mechanism to maintain ergodicity with respect to the high-fidelity density. The offline surrogate is trained on a small set of high-/low-fidelity evaluations and the step size is tuned via dual averaging on the surrogate, while the online phase performs sampling with acceptance that references the high-fidelity target. Numerical results on Rosenbrock, an 8-d correlated Gaussian, and a groundwater flow inverse problem show that MFNUTS achieves higher sampling efficiency (mESS) per high-fidelity evaluation compared to MH, HMC, NUTS, and DRAM, demonstrating substantial cost savings without compromising posterior accuracy. The work highlights the value of surrogate-driven gradient proposals in expensive Bayesian inference and points to future extensions with alternative surrogates and augmented rejection schemes.
Abstract
Markov Chain Monte Carlo (MCMC) methods often take many iterations to converge for highly correlated or high-dimensional target density functions. Methods such as Hamiltonian Monte Carlo (HMC) or No-U-Turn Sampling (NUTS) use the first-order derivative of the density function to tackle the aforementioned issues. However, the calculation of the derivative represents a bottleneck for computationally expensive models. We propose to first build a multi-fidelity Gaussian Process (GP) surrogate. The building block of the multi-fidelity surrogate is a hierarchy of models of decreasing approximation error and increasing computational cost. Then the generated multi-fidelity surrogate is used to approximate the derivative. The majority of the computation is assigned to the cheap models thereby reducing the overall computational cost. The derivative of the multi-fidelity method is used to explore the target density function and generate proposals. We select or reject the proposals using the Metropolis Hasting criterion using the highest fidelity model which ensures that the proposed method is ergodic with respect to the highest fidelity density function. We apply the proposed method to three test cases including some well-known benchmarks to compare it with existing methods and show that multi-fidelity No-U-turn sampling outperforms other methods.
