Table of Contents
Fetching ...

Efficient hyperparameter estimation in Bayesian inverse problems using sample average approximation

Julianne Chung, Scot M. Miller, Malena Sabate Landman, Arvind K. Saibaba

TL;DR

This paper tackles hyperparameter estimation in hierarchical Bayesian inverse problems with a linear forward model and Gaussian noise, where hyperparameters govern the Matérn-based priors and the noise model. It develops a sample average approximation (SAA) framework that replaces expensive log-determinant evaluations with Monte Carlo estimators, accelerated by a novel preconditioned Lanczos method. A parametric low-rank preconditioner that updates cheaply with hyperparameters, along with a gradient estimation strategy that reuses Lanczos information, enables scalable MAP estimation of the marginalized posterior. The approach is validated on static and dynamic seismic tomography problems, demonstrating accurate objective and gradient evaluations at reduced computational cost and improved reconstruction quality. The method advances scalable Bayesian hyperparameter estimation for large-scale linear inverse problems in geophysical applications.

Abstract

In Bayesian inverse problems, it is common to consider several hyperparameters that define the prior and the noise model that must be estimated from the data. In particular, we are interested in linear inverse problems with additive Gaussian noise and Gaussian priors defined using Matérn covariance models. In this case, we estimate the hyperparameters using the maximum a posteriori (MAP) estimate of the marginalized posterior distribution. However, this is a computationally intensive task since it involves computing log determinants. To address this challenge, we consider a stochastic average approximation (SAA) of the objective function and use the preconditioned Lanczos method to compute efficient approximations of the function and gradient evaluations. We propose a new preconditioner that can be updated cheaply for new values of the hyperparameters and an approach to compute approximations of the gradient evaluations, by reutilizing information from the function evaluations. We demonstrate the performance of our approach on static and dynamic seismic tomography problems.

Efficient hyperparameter estimation in Bayesian inverse problems using sample average approximation

TL;DR

This paper tackles hyperparameter estimation in hierarchical Bayesian inverse problems with a linear forward model and Gaussian noise, where hyperparameters govern the Matérn-based priors and the noise model. It develops a sample average approximation (SAA) framework that replaces expensive log-determinant evaluations with Monte Carlo estimators, accelerated by a novel preconditioned Lanczos method. A parametric low-rank preconditioner that updates cheaply with hyperparameters, along with a gradient estimation strategy that reuses Lanczos information, enables scalable MAP estimation of the marginalized posterior. The approach is validated on static and dynamic seismic tomography problems, demonstrating accurate objective and gradient evaluations at reduced computational cost and improved reconstruction quality. The method advances scalable Bayesian hyperparameter estimation for large-scale linear inverse problems in geophysical applications.

Abstract

In Bayesian inverse problems, it is common to consider several hyperparameters that define the prior and the noise model that must be estimated from the data. In particular, we are interested in linear inverse problems with additive Gaussian noise and Gaussian priors defined using Matérn covariance models. In this case, we estimate the hyperparameters using the maximum a posteriori (MAP) estimate of the marginalized posterior distribution. However, this is a computationally intensive task since it involves computing log determinants. To address this challenge, we consider a stochastic average approximation (SAA) of the objective function and use the preconditioned Lanczos method to compute efficient approximations of the function and gradient evaluations. We propose a new preconditioner that can be updated cheaply for new values of the hyperparameters and an approach to compute approximations of the gradient evaluations, by reutilizing information from the function evaluations. We demonstrate the performance of our approach on static and dynamic seismic tomography problems.

Paper Structure

This paper contains 24 sections, 1 theorem, 36 equations, 4 figures, 4 tables, 3 algorithms.

Key Result

Proposition 1

Let $\epsilon,\delta \in (0,1)$ denote two fixed user-defined parameters representing the absolute error and failure probability respectively. Furthermore, let the number of samples $n_{\rm mc}$ and the number of Lanczos iterations, $k$, be chosen to satisfy the inequalities Then, with probability at least $1-\delta$ where $\{{\bf w}_t\}_{t=1}^{n_{\rm mc}}$ are independent Rademacher random vecto

Figures (4)

  • Figure 1: In the left panel are relative errors of the objective function provided in terms of the number of Monte Carlo samples. The solid line corresponds to the average over 10 runs, and the bars represent $1$ standard deviation. In the right panel, relative objective function errors at each iteration of the optimization scheme are provided.
  • Figure 2: In the left panel is the ground truth image for the static seismic inverse example. The middle and right panels contain image reconstructions corresponding to the initial hyperparameters ${\boldsymbol{\theta}}_0$ and the optimized hyperparameters ${\boldsymbol{\theta}}_{\rm prec}$ respectively.
  • Figure 3: Top row: exact solutions at timepoints $t$ consisting of two rotating Gaussians. Bottom row: noisy measurements. The proportions of the images are accurate but, to aid visualization, the relative size between images and measurements is not.
  • Figure 4: Reconstructions using the seismic inversion model corresponding to the initial hyperparameters ${\boldsymbol{\theta}}_{0}=(0.59 \,1\,1\,1)$ (top row), the optimal parameters computed using approximate function and gradient evaluations based on the new SAA methods without preconditioning ${\boldsymbol{\theta}}_{\bf mc}~=~(0.15 \,647.51\, 0.21\,0.00)$ (middle row), and the optimal parameters computed with preconditioning ${\boldsymbol{\theta}}_{\bf prec}=(0.57 \, 0.19 \, 0.47 \, 0.32)$ (bottom row).

Theorems & Definitions (2)

  • Proposition 1
  • proof