Efficient hyperparameter estimation in Bayesian inverse problems using sample average approximation
Julianne Chung, Scot M. Miller, Malena Sabate Landman, Arvind K. Saibaba
TL;DR
This paper tackles hyperparameter estimation in hierarchical Bayesian inverse problems with a linear forward model and Gaussian noise, where hyperparameters govern the Matérn-based priors and the noise model. It develops a sample average approximation (SAA) framework that replaces expensive log-determinant evaluations with Monte Carlo estimators, accelerated by a novel preconditioned Lanczos method. A parametric low-rank preconditioner that updates cheaply with hyperparameters, along with a gradient estimation strategy that reuses Lanczos information, enables scalable MAP estimation of the marginalized posterior. The approach is validated on static and dynamic seismic tomography problems, demonstrating accurate objective and gradient evaluations at reduced computational cost and improved reconstruction quality. The method advances scalable Bayesian hyperparameter estimation for large-scale linear inverse problems in geophysical applications.
Abstract
In Bayesian inverse problems, it is common to consider several hyperparameters that define the prior and the noise model that must be estimated from the data. In particular, we are interested in linear inverse problems with additive Gaussian noise and Gaussian priors defined using Matérn covariance models. In this case, we estimate the hyperparameters using the maximum a posteriori (MAP) estimate of the marginalized posterior distribution. However, this is a computationally intensive task since it involves computing log determinants. To address this challenge, we consider a stochastic average approximation (SAA) of the objective function and use the preconditioned Lanczos method to compute efficient approximations of the function and gradient evaluations. We propose a new preconditioner that can be updated cheaply for new values of the hyperparameters and an approach to compute approximations of the gradient evaluations, by reutilizing information from the function evaluations. We demonstrate the performance of our approach on static and dynamic seismic tomography problems.
