Table of Contents
Fetching ...

Unique reconstruction for discretized inverse problems: a random sketching approach via subsampling

Ruhui Jin, Qin Li, Anjali Nair, Samuel Stechmann

TL;DR

This work studies parameter reconstruction in discretized inverse problems under limited data, introducing a random sketching approach that reduces the Gauss-Newton Hessian to a small $m\times m$ matrix ${\bf H}^N_s$ via uniform row sampling. It proves that, for rank $r$ and with $m\le r$, the sketched Hessian is well-conditioned with high probability, specifically $\mathbb{P}\left(\mathrm{cond}(\mathbf{H}^N_s) \le \frac{L+\tau(m)}{\ell-\tau(m)}\right) \ge 1-\frac{1}{r}$, where $\ell$ and $L$ bound the diagonal of ${\bf H}^N$ and $\tau(m)$ is a distortion term dependent on $m$, $r$, the coherence $\mu$, and norms of ${\bf H}^N$. The proof leverages matrix concentration inequalities and a moment bound on the hollow Gram component ${\bf M}^N_s={\bf H}^N_s-\mathrm{diag}(\mathbf{H}^N_s)$, yielding a sub-Gaussian tail. Numerical experiments on synthetic data and a PDE-based elliptic inverse problem corroborate the theory, showing that when $m$ scales with the data rank $r$, the reduced Hessian remains well-conditioned with high probability, enabling stable, locally unique reconstructions in the discrete setting.

Abstract

Theoretical inverse problems are often studied in an ideal infinite-dimensional setting. The well-posedness theory provides a unique reconstruction of the parameter function, when an infinite amount of data is given. Through the lens of PDE-constrained optimization, this means one attains the zero-loss property of the mismatch function in this setting. This is no longer true in computations when we are limited to finite amount of measurements due to experimental or economical reasons. Consequently, one must compromise the goal, from inferring a function, to a discrete approximation. What is the reconstruction power of a fixed number of data observations? How many parameters can one reconstruct? Here we describe a probabilistic approach, and spell out the interplay of the observation size $(r)$ and the number of parameters to be uniquely identified $(m)$. The technical pillar here is the random sketching strategy, in which the matrix concentration inequality and sampling theory are largely employed. By analyzing a randomly subsampled Hessian matrix, we attain a well-conditioned reconstruction problem with high probability. Our main theory is validated in numerical experiments, using an elliptic inverse problem as an example.

Unique reconstruction for discretized inverse problems: a random sketching approach via subsampling

TL;DR

This work studies parameter reconstruction in discretized inverse problems under limited data, introducing a random sketching approach that reduces the Gauss-Newton Hessian to a small matrix via uniform row sampling. It proves that, for rank and with , the sketched Hessian is well-conditioned with high probability, specifically , where and bound the diagonal of and is a distortion term dependent on , , the coherence , and norms of . The proof leverages matrix concentration inequalities and a moment bound on the hollow Gram component , yielding a sub-Gaussian tail. Numerical experiments on synthetic data and a PDE-based elliptic inverse problem corroborate the theory, showing that when scales with the data rank , the reduced Hessian remains well-conditioned with high probability, enabling stable, locally unique reconstructions in the discrete setting.

Abstract

Theoretical inverse problems are often studied in an ideal infinite-dimensional setting. The well-posedness theory provides a unique reconstruction of the parameter function, when an infinite amount of data is given. Through the lens of PDE-constrained optimization, this means one attains the zero-loss property of the mismatch function in this setting. This is no longer true in computations when we are limited to finite amount of measurements due to experimental or economical reasons. Consequently, one must compromise the goal, from inferring a function, to a discrete approximation. What is the reconstruction power of a fixed number of data observations? How many parameters can one reconstruct? Here we describe a probabilistic approach, and spell out the interplay of the observation size and the number of parameters to be uniquely identified . The technical pillar here is the random sketching strategy, in which the matrix concentration inequality and sampling theory are largely employed. By analyzing a randomly subsampled Hessian matrix, we attain a well-conditioned reconstruction problem with high probability. Our main theory is validated in numerical experiments, using an elliptic inverse problem as an example.
Paper Structure (16 sections, 8 theorems, 61 equations, 7 figures, 4 tables)

This paper contains 16 sections, 8 theorems, 61 equations, 7 figures, 4 tables.

Key Result

Theorem 1

Consider the Hessian matrix $\mathcal{H}\mathop{\mathrm{loss}}\nolimits(\pmb{\sigma}) \in \mathbb{R}^{N \times N}$eqn:hessian_N of rank $r$. Fix the sampling dimension $m \in \mathbb{N}^+$, where $m \leq o(r) \leq N$. The subsampled Hessian ${\bf S}\,\mathcal{H}\mathop{\mathrm{loss}}\nolimits(\pmb{\ The constants in the $o$ notations are to be explicitly spelled out in the main Theorem thm: well-c

Figures (7)

  • Figure 1: Distributions of $\min$ and $\max$ diagonal entries in ${\bf H}^N_s$ out of 10,000 simulations. We sample $m = 30$ from the Gauss-Newton Hessian ${\bf H}^N = \pmb{\Phi}^N \pmb{\Phi}^{N\top}$ where $\pmb{\Phi}^N\in \mathbb{R}^{N \times r}$ is generated by a random Gaussian matrix in the size of $N = 5000, r = 100.$
  • Figure 2: The concentration of condition number $\mathrm{cond} ({\bf H}^N_s).$
  • Figure 3: One solution to the forward model \ref{['eqn: EIT']} at certain source location.
  • Figure 4: Measurement layouts. The two layouts present sub-domains $\mathcal{D}_1$ and $\mathcal{D}_2.$
  • Figure 5: The entry magnitude of matrices $\pmb{\Phi}^N$ (top) and ${\bf H}^N$ (bottom). The left and right panels are results for $\mathcal{D}_1$ and $\mathcal{D}_2$ respectively. The magnitudes are showed in logarithm values, where the data matrices are first processed by Gaussian filters with std 20.
  • ...and 2 more figures

Theorems & Definitions (15)

  • Theorem : exposition of main \ref{['thm: well-conditioned hess']}
  • Definition 1
  • Theorem 2
  • Lemma 3
  • Proposition 4: sub-Gaussian distribution (Proposition 10 of T08-1)
  • Lemma 5
  • proof : Proof of \ref{['thm: well-conditioned hess']}
  • Remark 1
  • proof : Proof of \ref{['prob comparison']}
  • Lemma 6: Decoupling - Theorem 9 of T08-1
  • ...and 5 more