A Gauss-Newton Method with No Additional PDE Solves Beyond Gradient Evaluation for Large-Scale PDE-Constrained Inverse Problems

Cash Cherry; Samy Wu Fung; Luis Tenorio; Ebru Bozdağ

A Gauss-Newton Method with No Additional PDE Solves Beyond Gradient Evaluation for Large-Scale PDE-Constrained Inverse Problems

Cash Cherry, Samy Wu Fung, Luis Tenorio, Ebru Bozdağ

TL;DR

A Gauss-Newton approach that eliminates the need for extra PDE solves beyond those required for gradient computation and achieves the efficiency of gradient-based schemes while retaining the fast convergence of Gauss-Newton methods is proposed.

Abstract

Partial Differential Equation (PDE)-constrained optimization problems often take the form of an optimization of an objective function given as a sum of loss terms. Each function or gradient evaluation requires one or more PDE solves, which render these problems computationally demanding. While Gauss-Newton methods are well-suited for large-scale PDE-constrained optimization, their application to settings such as Full-Waveform Inversion (FWI) is hindered by the need for additional PDE solves to compute Jacobian-vector products. This paper proposes a Gauss-Newton approach that eliminates the need for extra PDE solves beyond those required for gradient computation. Our numerical experiments on FWI demonstrate that the proposed method achieves the efficiency of gradient-based schemes while retaining the fast convergence of Gauss-Newton methods.

A Gauss-Newton Method with No Additional PDE Solves Beyond Gradient Evaluation for Large-Scale PDE-Constrained Inverse Problems

TL;DR

Abstract

Paper Structure (18 sections, 1 theorem, 33 equations, 13 figures)

This paper contains 18 sections, 1 theorem, 33 equations, 13 figures.

Introduction
Mathematical Background
Gauss-Newton Method
An Efficient Gradient-Only Gauss-Newton Method (GOGN)
Problem Reformulation
Constructing the GOGN Jacobian from Available Gradients
The Gradient-Only Gauss-Newton Update
Convergence Analysis
Summary and Computational Discussion
Related Work
Numerical Experiments
Experimental Setup
Algorithmic Setup
Results and Discussion
Conclusion
...and 3 more sections

Key Result

theorem 1

Let $\mathcal{C} = \{\bm \: | \: F(\bm) \leq F(\bm_0) \}$ be the sublevel set defined by a starting point $\bm_0$, and assume $\mathcal{C}$ is compact. Suppose $F$ has Lipschitz-continuous gradients on an open set containing $\mathcal{C}$, and the regularizer satisfies for all $\bm \in \mathcal{C}$ with $0 < \mu \leq M < \infty$. Consider the GOGN iterates, $\bm_{k+1} = \bm_k + \alpha_k\, \mathbf

Figures (13)

Figure 1: Our four source-receiver configurations displayed over the speed perturbation ${\delta c}/{c_0}$ for our target model - on the top left, we sample $8$ sources and $300$ receivers from a uniform distribution over the square $[100{\rm km}, 400{\rm km}]^2,$ and on the top right, we transform $5$ source and $181$ receiver locations from around the Pacific ocean to be contained in our domain, and configurations in the bottom row comprising versions of the top row with additional sources added (so that each has $25$ sources)
Figure 2: Convergence plots for uniformly distributed receiver coverage (top row) and realistic coverage (bottom row) for nonlinear conjugate gradient (NLCG), limited-memory BFGS (LBFGS), gradient-only Gauss-Newton (GOGN) and Gauss-Newton CG (GNCG), with X-axes across all images denoting number of PDE solves during optimization, and Y-axes denoting model error (left), gradient norm (middle), and objective function values (right) for $8$ and $5$ sources, respectively, at a noise level of $\sigma = 0.1$
Figure 3: Model reconstructions for noise level $0.1$ based on the experiments in Figure \ref{['fig:convergence_plots_5sources']}
Figure 4: Convergence plots for uniformly distributed receiver coverage (top row) and realistic coverage (bottom row), with X-axes across all images denoting number of PDE solves during optimization, and Y-axes denoting model error (left), gradient norm (middle), and objective function values (right) for $25$ sources at a noise level of $\sigma = 0.1$
Figure 5: Final reconstructions from the experiments in Fig. \ref{['fig:convergence_plots_25sources']}
...and 8 more figures

Theorems & Definitions (2)

theorem 1
proof

A Gauss-Newton Method with No Additional PDE Solves Beyond Gradient Evaluation for Large-Scale PDE-Constrained Inverse Problems

TL;DR

Abstract

A Gauss-Newton Method with No Additional PDE Solves Beyond Gradient Evaluation for Large-Scale PDE-Constrained Inverse Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (2)