Table of Contents
Fetching ...

Efficient iterative methods for hyperparameter estimation in large-scale linear inverse problems

Khalil A Hall-Hooper, Arvind K Saibaba, Julianne Chung, Scot M Miller

TL;DR

The paper addresses hyperparameter estimation in large-scale linear-Gaussian inverse problems by combining an empirical Bayes (EB) approach with generalized Golub-Kahan (genGK) bidiagonalization to form low-rank surrogates. It derives approximations to the EB objective and its gradient, $\tilde{\mathcal{F}}_k$ and $\widetilde{\nabla \mathcal{F}}_k$, that rely on a rank-$k$ reduction $\mathbf{A} \approx \mathbf{U}_{k+1} \mathbf{B}_k \mathbf{V}_k^\top$ and avoid explicit computation of square roots or inverses of the prior covariance. The work provides rigorous error bounds, a posteriori error estimators via Monte Carlo trace techniques, and practical stopping criteria for the genGK iterations. Numerical experiments on inverse heat transfer (1D), seismic tomography (2D), and atmospheric tomography demonstrate accurate hyperparameter recovery and substantial speedups over full-matrix computations, highlighting the method’s robustness when noise and prior variances are unknown. The approach is general and can inform fully Bayesian inference, variational Bayes, and information-theoretic design for hyperparameters in large-scale problems, enabling scalable uncertainty quantification in geophysical applications.

Abstract

We study Bayesian methods for large-scale linear inverse problems, focusing on the challenging task of hyperparameter estimation. Typical hierarchical Bayesian formulations that follow a Markov Chain Monte Carlo approach are possible for small problems with very few hyperparameters but are not computationally feasible for problems with a very large number of unknown parameters. In this work, we describe an empirical Bayesian (EB) method to estimate hyperparameters that maximize the marginal posterior, i.e., the probability density of the hyperparameters conditioned on the data, and then we use the estimated values to compute the posterior of the inverse parameters. For problems where the computation of the square root and inverse of prior covariance matrices are not feasible, we describe an approach based on the generalized Golub-Kahan bidiagonalization to approximate the marginal posterior and seek hyperparameters that minimize the approximate marginal posterior. Numerical results from seismic and atmospheric tomography demonstrate the accuracy, robustness, and potential benefits of the proposed approach.

Efficient iterative methods for hyperparameter estimation in large-scale linear inverse problems

TL;DR

The paper addresses hyperparameter estimation in large-scale linear-Gaussian inverse problems by combining an empirical Bayes (EB) approach with generalized Golub-Kahan (genGK) bidiagonalization to form low-rank surrogates. It derives approximations to the EB objective and its gradient, and , that rely on a rank- reduction and avoid explicit computation of square roots or inverses of the prior covariance. The work provides rigorous error bounds, a posteriori error estimators via Monte Carlo trace techniques, and practical stopping criteria for the genGK iterations. Numerical experiments on inverse heat transfer (1D), seismic tomography (2D), and atmospheric tomography demonstrate accurate hyperparameter recovery and substantial speedups over full-matrix computations, highlighting the method’s robustness when noise and prior variances are unknown. The approach is general and can inform fully Bayesian inference, variational Bayes, and information-theoretic design for hyperparameters in large-scale problems, enabling scalable uncertainty quantification in geophysical applications.

Abstract

We study Bayesian methods for large-scale linear inverse problems, focusing on the challenging task of hyperparameter estimation. Typical hierarchical Bayesian formulations that follow a Markov Chain Monte Carlo approach are possible for small problems with very few hyperparameters but are not computationally feasible for problems with a very large number of unknown parameters. In this work, we describe an empirical Bayesian (EB) method to estimate hyperparameters that maximize the marginal posterior, i.e., the probability density of the hyperparameters conditioned on the data, and then we use the estimated values to compute the posterior of the inverse parameters. For problems where the computation of the square root and inverse of prior covariance matrices are not feasible, we describe an approach based on the generalized Golub-Kahan bidiagonalization to approximate the marginal posterior and seek hyperparameters that minimize the approximate marginal posterior. Numerical results from seismic and atmospheric tomography demonstrate the accuracy, robustness, and potential benefits of the proposed approach.
Paper Structure (41 sections, 3 theorems, 73 equations, 11 figures, 2 tables, 3 algorithms)

This paper contains 41 sections, 3 theorems, 73 equations, 11 figures, 2 tables, 3 algorithms.

Key Result

Proposition 3.1

Define the approximation to $\bar{\mathcal{F}}$ using the truncated SVD of $\widehat{\mathbf{A}}$ as with $\mathbf{Z}_{k} = \mathbf{R}^{1/2} \left[ \widehat{\mathbf{A}}_{k} \widehat{\mathbf{A}}_{k}^{\top} + \mathbf{I}_{m} \right] \mathbf{R}^{1/2}.$ Then where $\beta_{1} = \| \mathbf{A} \boldsymbol{\mu} - \mathbf{d} \|_{\mathbf{R}^{-1}}$.

Figures (11)

  • Figure 1: 1D Heat: Relative errors and corresponding bounds. The left panel compares the overall accuracy of the objective function; the middle and right panels compare the errors of two components of the objective function: the log determinant and the quadratic terms respectively.
  • Figure 2: 1D Heat: (left) Relative error in the objective function using genGK and GSVD, and (right) plot of singular values of $\widehat{\mathbf{A}}$.
  • Figure 3: 1D Heat: On the left, we provide the observations (with and without noise). On the right, we provide the reconstruction, along with the true solution.
  • Figure 4: 1D Heat: Relative errors and corresponding bounds along the optimization trajectory for $k=22$ and $n_{mc}=10$. The left panel compares the overall accuracy in the objective function, and the middle and right panels compare the errors of the log determinant and the quadratic term respectively.
  • Figure 5: 1D Heat: Wall clock time to compute the objective function and gradient pair versus the problem size $n$.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • proof
  • proof : Proof of Proposition \ref{['prop:trace']}