Bayesian D-Optimal Experimental Designs via Column Subset Selection

Srinivas Eswar; Vishwas Rao; Arvind K. Saibaba

Bayesian D-Optimal Experimental Designs via Column Subset Selection

Srinivas Eswar, Vishwas Rao, Arvind K. Saibaba

TL;DR

The paper reframes Bayesian D-optimal experimental design for linear inverse problems as a Column Subset Selection Problem and develops efficient, provably reliable algorithms based on the Golub–Klema–Stewart framework. It introduces deterministic CSSP methods and multiple randomized, adjoint-free variants (notably RAF-OED) along with a data completion technique (bdeim), providing rigorous bounds and cost analyses dominated by truncated SVDs. The approach yields scalable sensor placement with strong performance guarantees, combining matrix-free computations and parallelizability, and it is validated on 2D heat-equation and seismic travel-time tomography examples. Overall, the work offers a practical, tunable toolkit for OED in large-scale Bayesian inverse problems, with potential extensions to other criteria and non-linear settings.

Abstract

This paper tackles optimal sensor placement for Bayesian linear inverse problems, a popular version of the more general Optimal Experimental Design (OED) problem, using the D-optimality criterion. This is done by establishing connections between sensor placement and Column Subset Selection Problem (CSSP), which is a well-studied problem in Numerical Linear Algebra (NLA). In particular, we use the Golub-Klema-Stewart (GKS) approach which involves computing the truncated Singular Value Decomposition (SVD) followed by a pivoted QR factorization on the right singular vectors. The algorithms are further accelerated by using randomization to compute the low-rank approximation as well as for sampling the indices. The resulting algorithms are robust, computationally efficient, amenable to parallelization, require virtually no parameter tuning, and come with strong theoretical guarantees. One of the proposed algorithms is also adjoint-free which is beneficial in situations, where the adjoint is expensive to evaluate or is not available. Additionally, we develop a method for data completion without solving the inverse problem. Numerical experiments on model inverse problems involving the heat equation and seismic tomography in two spatial dimensions demonstrate the performance of our approaches.

Bayesian D-Optimal Experimental Designs via Column Subset Selection

TL;DR

Abstract

Paper Structure (49 sections, 8 theorems, 72 equations, 9 figures, 5 tables, 3 algorithms)

This paper contains 49 sections, 8 theorems, 72 equations, 9 figures, 5 tables, 3 algorithms.

Introduction
Contributions and Features
Preliminaries
Notation and matrix preliminaries
Background on inverse problems and OED
Challenges and related work
Other related work
Rank-revealing QR factorizations
Column subset selection for OED
Interpreting OED as CSSP
Connection to maximum-volume
Connection to RRQR
The gks approach
Structural bounds on the D-optimality of C
Deterministic CSSP algorithm for OED
...and 34 more sections

Key Result

Proposition 3.1

\newlabelprop:doptnp0 The optimization problem of optimizing the objective function eqn:dopt over all index sets $S$ corresponding to $k$ columns from ${\bm{{\mathbf{A}}}}$ is NP-hard.

Figures (9)

Figure 1: Problem setup for the Heat problem. The left panel shows the true initial conditions (Franke's function). The middle panel shows the true state along with the sensor locations as black squares. The red sensors are the ones selected by RandGKS. The right shows the reconstruction from these 30 sensors.
Figure 1: D-optimal criterion and relative error in the RandGKS algorithm with increasing number of sensors. Relative error stagnates around 30 sensors while D-optimality keeps increasing with more sensors.
Figure 2: Problem setup for the seismic tomography problem. The left panel shows the true image along with the blue source on its right boundary. 256 receivers are uniformly placed on the left and top boundaries (a subset of receivers are denoted by red stars). The 20 red locations are selected by RandGKS. A Fresnel zone for one selected source-receiver pair is shown in the middle panel. The right panel contains the reconstruction from the selected sensors.
Figure 2: Evaluating the RandGKS algorithm against random sensor placements in the Seismic problem. 100 random sensor selections, with $k=10$ or $k=50$, are used to plot the D-optimality and relative error histograms. RandGKS always has a high D-optimality and reasonable relative error for all $k$.
Figure 3: D-optimality and relative error changes in the RandGKS algorithm with an increasing number of sensors $k$ for the Heat problem. The relative error decreases and the D-optimal criterion increases with increasing $k$. Notice that we achieve good reconstructions with as few as $20$ sensors with respect to the full operator but D-optimality is smaller than that of the full operator.
...and 4 more figures

Theorems & Definitions (21)

Proposition 3.1
Proof 1
Theorem 3.2
Proof 2
Corollary 3.3
Proof 3
Theorem 4.1: RAF-OED
Proof 4
Theorem 5.1: Random sampling
Proof 5
...and 11 more

Bayesian D-Optimal Experimental Designs via Column Subset Selection

TL;DR

Abstract

Bayesian D-Optimal Experimental Designs via Column Subset Selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (21)