Table of Contents
Fetching ...

Data selection: at the interface of PDE-based inverse problem and randomized linear algebra

Kathrin Hellmuth, Ruhui Jin, Qin Li, Stephen J. Wright

TL;DR

This survey explores data selection for PDE-based inverse problems through the lens of randomized numerical linear algebra (RNLA). It identifies the core challenge of simultaneous infinite-dimensional parameter and design spaces and shows how RNLA techniques—such as matrix sketching, randomized SVD, and Hessian/subset strategies—can be tailored to the tensorized sensitivities that arise from PDE linearization. The paper connects PDE-constrained optimization and Bayesian design with RNLA, proposing qualitative data-selection tools that prioritize efficiency while preserving essential information for reconstruction. It also outlines theoretical results, practical algorithms, and open questions, especially regarding nonlinear extensions and infinite-dimensional formulations, with implications for scalable design in physics and engineering contexts.

Abstract

All inverse problems rely on data to recover unknown parameters, yet not all data are equally informative. This raises the central question of data selection. A distinctive challenge in PDE-based inverse problems is their inherently infinite-dimensional nature: both the parameter space and the design space are infinite, which greatly complicates the selection process. Somewhat unexpectedly, randomized numerical linear algebra (RNLA), originally developed in very different contexts, has provided powerful tools for addressing this challenge. These methods are inherently probabilistic, with guarantees typically stating that information is preserved with probability at least 1-p when using N randomly selected, weighted samples. Here, the notion of information can take different mathematical forms depending on the setting. In this review, we survey the problem of data selection in PDE-based inverse problems, emphasize its unique infinite-dimensional aspects, and highlight how RNLA strategies have been adapted and applied in this context.

Data selection: at the interface of PDE-based inverse problem and randomized linear algebra

TL;DR

This survey explores data selection for PDE-based inverse problems through the lens of randomized numerical linear algebra (RNLA). It identifies the core challenge of simultaneous infinite-dimensional parameter and design spaces and shows how RNLA techniques—such as matrix sketching, randomized SVD, and Hessian/subset strategies—can be tailored to the tensorized sensitivities that arise from PDE linearization. The paper connects PDE-constrained optimization and Bayesian design with RNLA, proposing qualitative data-selection tools that prioritize efficiency while preserving essential information for reconstruction. It also outlines theoretical results, practical algorithms, and open questions, especially regarding nonlinear extensions and infinite-dimensional formulations, with implications for scalable design in physics and engineering contexts.

Abstract

All inverse problems rely on data to recover unknown parameters, yet not all data are equally informative. This raises the central question of data selection. A distinctive challenge in PDE-based inverse problems is their inherently infinite-dimensional nature: both the parameter space and the design space are infinite, which greatly complicates the selection process. Somewhat unexpectedly, randomized numerical linear algebra (RNLA), originally developed in very different contexts, has provided powerful tools for addressing this challenge. These methods are inherently probabilistic, with guarantees typically stating that information is preserved with probability at least 1-p when using N randomly selected, weighted samples. Here, the notion of information can take different mathematical forms depending on the setting. In this review, we survey the problem of data selection in PDE-based inverse problems, emphasize its unique infinite-dimensional aspects, and highlight how RNLA strategies have been adapted and applied in this context.

Paper Structure

This paper contains 24 sections, 56 equations, 12 figures.

Figures (12)

  • Figure 1: Conceptual pathway from infinite- to finite-dimensional formulations. The origin $(n,c) = (\infty,\infty)$ represents the infinite-dimensional setting, where well-posedness is established. The green arrow reduces the parameter space to finite dimension while keeping data abundant, leading to the semi-infinite case. The red arrow then reduces the data to finitely many experiments, posing the key challenges of identifiability and data selection.
  • Figure 2: Sketching mechanism.
  • Figure 3: Major classes of sketching matrix choices. Orange squares indicate nontrivial entries. Gray squares represent unsampled entries in (b), and zero-valued entries in (c). In (b) FJLT, the $(j,k)$-th entry of $\pmb{\mathcal{F}}$ is given by $\pm \frac{1}{\sqrt{n}} \omega^{jk}$, where $\omega = \text{exp}(-\frac{2 \pi \mathrm{i}}{n})$ is the $n$-th root of unity, and $jk$ is the exponent. The sign (positive or negative) is determined by a random sign-flip applied to its column.
  • Figure 4: Tensor sketching acceleration: the complexity of computing $\mathbf{S}\mathbf{A}$ breaks down to the scale of each factor matrix size.
  • Figure 5: Matrix-matrix product sampling demonstration.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Example 1: Linearized EIT \ref{['itm: E2']}