Table of Contents
Fetching ...

Multigrid Monte Carlo Revisited: Theory and Bayesian Inference

Yoshihito Kazashi, Eike H. Müller, Robert Scheichl

TL;DR

This work revisits Multigrid Monte Carlo (MGMC) for sampling Gaussian random fields, reframing the method as a solver-sampler entanglement that leverages a multilevel hierarchy to overcome critical slowing down. The authors extend MGMC to linear Bayesian inverse problems with low-rank perturbations, introduce bespoke random smoothers, and prove grid-size independent convergence of both the mean and covariance to the target Gaussian, along with posterior-quantity estimators whose Monte Carlo error remains mesh-independent. They show the algorithm is optimal in the continuum limit, with online costs growing linearly in the problem size and integrated autocorrelation times that do not deteriorate with refinement. Numerical experiments in 2D and 3D confirm the theory: MGMC outperforms Gibbs sampling and, on larger problems, even Cholesky-based sampling, especially for rough Matérn-like fields where the precision operator is sparse. The results indicate MGMC is a robust and scalable tool for Bayesian inference with Gaussian priors, including non-stationary covariances and low-rank updates, with clear potential for parallel implementations and extensions to broader classes of priors and likelihoods.

Abstract

Gaussian random fields play an important role in many areas of science and engineering. In practice, they are often simulated by sampling from a high-dimensional multivariate normal distribution, which arises from the discretisation of a suitable precision operator. Existing methods such as Cholesky factorization and Gibbs sampling become prohibitively expensive on fine meshes due to their high computational cost. In this work, we revisit the Multigrid Monte Carlo (MGMC) algorithm developed by Goodman & Sokal (Physical Review D 40.6, 1989) in the quantum physics context. While the authors of this paper conclude that MGMC does not overcome critical slowing down in simulations of field theories near phase transitions, we demonstrate here that it has the potential to significantly accelerate sampling in spatial statistics. The class of Gaussian Random Fields we consider includes those with Matérn covariance, but is more general in that it also allows for non-stationary covariance functions. To show that MGMC can overcome the limitation of existing methods, we establish a grid-size-independent convergence theory based on the link between linear solvers and samplers for multivariate normal distributions, drawing on standard multigrid convergence arguments. We then apply this theory to linear Bayesian inverse problems. This application is achieved by extending the standard multigrid theory to operators with a low-rank perturbation. Moreover, we develop a novel bespoke random smoother which takes care of the low-rank updates that arise in constructing posterior moments. In particular, we prove that Multigrid Monte Carlo is algorithmically optimal in the limit of the grid-size going to zero. Numerical results support our theory, demonstrating that Multigrid Monte Carlo can be significantly more efficient than alternative methods when applied in a Bayesian setting.

Multigrid Monte Carlo Revisited: Theory and Bayesian Inference

TL;DR

This work revisits Multigrid Monte Carlo (MGMC) for sampling Gaussian random fields, reframing the method as a solver-sampler entanglement that leverages a multilevel hierarchy to overcome critical slowing down. The authors extend MGMC to linear Bayesian inverse problems with low-rank perturbations, introduce bespoke random smoothers, and prove grid-size independent convergence of both the mean and covariance to the target Gaussian, along with posterior-quantity estimators whose Monte Carlo error remains mesh-independent. They show the algorithm is optimal in the continuum limit, with online costs growing linearly in the problem size and integrated autocorrelation times that do not deteriorate with refinement. Numerical experiments in 2D and 3D confirm the theory: MGMC outperforms Gibbs sampling and, on larger problems, even Cholesky-based sampling, especially for rough Matérn-like fields where the precision operator is sparse. The results indicate MGMC is a robust and scalable tool for Bayesian inference with Gaussian priors, including non-stationary covariances and low-rank updates, with clear potential for parallel implementations and extensions to broader classes of priors and likelihoods.

Abstract

Gaussian random fields play an important role in many areas of science and engineering. In practice, they are often simulated by sampling from a high-dimensional multivariate normal distribution, which arises from the discretisation of a suitable precision operator. Existing methods such as Cholesky factorization and Gibbs sampling become prohibitively expensive on fine meshes due to their high computational cost. In this work, we revisit the Multigrid Monte Carlo (MGMC) algorithm developed by Goodman & Sokal (Physical Review D 40.6, 1989) in the quantum physics context. While the authors of this paper conclude that MGMC does not overcome critical slowing down in simulations of field theories near phase transitions, we demonstrate here that it has the potential to significantly accelerate sampling in spatial statistics. The class of Gaussian Random Fields we consider includes those with Matérn covariance, but is more general in that it also allows for non-stationary covariance functions. To show that MGMC can overcome the limitation of existing methods, we establish a grid-size-independent convergence theory based on the link between linear solvers and samplers for multivariate normal distributions, drawing on standard multigrid convergence arguments. We then apply this theory to linear Bayesian inverse problems. This application is achieved by extending the standard multigrid theory to operators with a low-rank perturbation. Moreover, we develop a novel bespoke random smoother which takes care of the low-rank updates that arise in constructing posterior moments. In particular, we prove that Multigrid Monte Carlo is algorithmically optimal in the limit of the grid-size going to zero. Numerical results support our theory, demonstrating that Multigrid Monte Carlo can be significantly more efficient than alternative methods when applied in a Bayesian setting.
Paper Structure (62 sections, 29 theorems, 234 equations, 4 figures, 2 tables, 6 algorithms)

This paper contains 62 sections, 29 theorems, 234 equations, 4 figures, 2 tables, 6 algorithms.

Key Result

Proposition 2.2

Let $u\in V$ satisfy eq:eq-given-by-A and let Assumption assu:WtoH-best-Vell-approx hold. Suppose furthermore that $u_{\ell}\in V_{\ell}$ satisfies Then there exists a constant $\tilde{C}>0$ independent of $\ell$ such that

Figures (4)

  • Figure 1: Convergence of the mean $\mu_L^{(m)}$ and variance $(\sigma_L^{(m)})^2$ for the Gibbs sampler (crosses) and for the MGMC sampler (filled circles). The plot shows the estimators $\widehat{R}_L^{(m)}$ (blue solid lines) and $\widehat{Z}_L^{(m)}$ (red dashed lines) for the quantities defined in \ref{['eqn:convergence_rate_definition']}. The grid is of size $128\times 128$ and the number of independent Markov chains is $n_{\text{samples}}=100,000$.
  • Figure 2: Dependence of the convergence rates $\widehat{\rho}_L$ (left) and $\widehat{\zeta}_L$ (right) defined in \ref{['eqn:convergence_rate_definitions']} on the resolution for different correlation lengths $\kappa^{-1}$. For $\kappa^{-1}=0.01$ the mean $\mu_L^{(m)}$ converged so rapidly for both samplers that the convergence rate $\widehat{\rho}_L$ could not be measured reliably with the given statistics. While one would expect to see $2\times 3$ different curves for $\widehat{\rho}_L$ and $2\times 4$ curves for $\widehat{\zeta}_L$, the results for $\kappa=1.0$ and $\kappa=10.0$ lie on top of each other in the left figure and they almost overlap in the right figure, in particular for the MGMC sampler.
  • Figure 3: Lagged autocorrelation function $\widehat{\Gamma}_z(t)/\widehat{\Gamma}_z(0)$ for fixed correlation length $\kappa^{-1}=0.1$ (left) and IACT for different $\kappa^{-1}$ (right).
  • Figure 4: Estimated root mean squared error $\widehat{\Delta}_L^{(M)}$ as defined in \ref{['eqn:RMSE_estimator']} as a function of the length of the Markov chain $M$ for different grid sizes.

Theorems & Definitions (64)

  • Example 2.1
  • Proposition 2.2: Aubin and Nitsche
  • Remark 2.3
  • Lemma 3.1: Symmetrised Random Smoother
  • proof
  • Theorem 3.2: Cost of Multigrid Monte Carlo update
  • proof
  • Remark 3.3
  • Remark 3.4
  • Theorem 3.5: MGMC setup cost
  • ...and 54 more