Table of Contents
Fetching ...

Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels

Nick Alger, Tucker Hartland, Noemi Petra, Omar Ghattas

TL;DR

Addresses the challenge of approximating high-rank Hessians in PDE-constrained inverse problems. It proposes a matrix-free PSF-based framework that computes impulse-response batches via applying the operator to Dirac combs and uses normalized local mean displacement invariance to interpolate kernel entries, yielding an H-matrix approximation of the operator. The approach delivers large reductions in PDE solves when preconditioning the Gauss-Newton Hessian $\mathbf{H}_{gn}$, as shown in ice-sheet basal friction and advection-diffusion inversions, and also applies to a PDE-free blur example. The method scales to high-rank regimes by limiting the number of operator applications and exploiting locality and non-negativity, offering a data-scalable alternative to low-rank Hessian approximations.

Abstract

We present an efficient matrix-free point spread function (PSF) method for approximating operators that have locally supported non-negative integral kernels. The method computes impulse responses at scattered points, and interpolates these impulse responses to approximate integral kernel entries. Impulse responses are computed by applying the operator to Dirac comb batches of point sources, which are chosen via an ellipsoid packing procedure. Evaluation of kernel entries allows us to construct a hierarchical matrix approximation of the operator, which is used for further matrix computations. We illustrate the end-to-end method on a blur problem, then use the method to build preconditioners for the Hessian in two inverse problems governed by partial differential equations (PDEs): inversion for the basal friction coefficient in an ice sheet flow problem and for the initial condition in an advective-diffusive transport problem. While for many ill-posed inverse problems the Hessian of the data misfit term exhibits a low rank structure, and hence a low rank approximation is suitable, for many problems of practical interest the numerical rank of the Hessian is still large. But Hessian impulse responses typically become more local as the numerical rank increases, which benefits the PSF method. Numerical results reveal that the PSF preconditioner clusters the spectrum of the preconditioned Hessian near one, yielding roughly 5x-10x reductions in the required number of PDE solves, as compared to regularization preconditioning and no preconditioning. We also present a numerical study for the influence of various parameters (that control the shape of the impulse responses) on the effectiveness of the advection-diffusion Hessian approximation. The results show that the PSF-based preconditioners are able to form good approximations of high-rank Hessians using a small number of operator applications.

Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels

TL;DR

Addresses the challenge of approximating high-rank Hessians in PDE-constrained inverse problems. It proposes a matrix-free PSF-based framework that computes impulse-response batches via applying the operator to Dirac combs and uses normalized local mean displacement invariance to interpolate kernel entries, yielding an H-matrix approximation of the operator. The approach delivers large reductions in PDE solves when preconditioning the Gauss-Newton Hessian , as shown in ice-sheet basal friction and advection-diffusion inversions, and also applies to a PDE-free blur example. The method scales to high-rank regimes by limiting the number of operator applications and exploiting locality and non-negativity, offering a data-scalable alternative to low-rank Hessian approximations.

Abstract

We present an efficient matrix-free point spread function (PSF) method for approximating operators that have locally supported non-negative integral kernels. The method computes impulse responses at scattered points, and interpolates these impulse responses to approximate integral kernel entries. Impulse responses are computed by applying the operator to Dirac comb batches of point sources, which are chosen via an ellipsoid packing procedure. Evaluation of kernel entries allows us to construct a hierarchical matrix approximation of the operator, which is used for further matrix computations. We illustrate the end-to-end method on a blur problem, then use the method to build preconditioners for the Hessian in two inverse problems governed by partial differential equations (PDEs): inversion for the basal friction coefficient in an ice sheet flow problem and for the initial condition in an advective-diffusive transport problem. While for many ill-posed inverse problems the Hessian of the data misfit term exhibits a low rank structure, and hence a low rank approximation is suitable, for many problems of practical interest the numerical rank of the Hessian is still large. But Hessian impulse responses typically become more local as the numerical rank increases, which benefits the PSF method. Numerical results reveal that the PSF preconditioner clusters the spectrum of the preconditioned Hessian near one, yielding roughly 5x-10x reductions in the required number of PDE solves, as compared to regularization preconditioning and no preconditioning. We also present a numerical study for the influence of various parameters (that control the shape of the impulse responses) on the effectiveness of the advection-diffusion Hessian approximation. The results show that the PSF-based preconditioners are able to form good approximations of high-rank Hessians using a small number of operator applications.
Paper Structure (25 sections, 28 equations, 14 figures, 5 tables)

This paper contains 25 sections, 28 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: (\ref{['fig:batches_intro']}) One batch, $\eta_b$, of normalized impulse responses, $\phi_{x}$, that arise from applying $\mathcal{A}$ to a weighted sum of scattered point sources (see Section \ref{['sec:get_impulse_response']}). Here, $\mathcal{A}$ is the ice sheet inverse problem data misfit Gauss-Newton Hessian described in Section \ref{['sec:numerical_results']}. Black stars are point source locations. Shading shows the magnitude of the normalized impulse responses (darker means larger function values). Dashed gray ellipses are estimated impulse response support ellipsoids based on the moment method in Section \ref{['sec:intromoments']}. The large circle is $\partial \Omega$. (\ref{['fig:mean_displacement_invariance']}) Illustration of impulse responses, $\phi_x$ and $\phi_{x'}$, corresponding to points $x$ and $x'$. The operator $\mathcal{A}$ is locally mean displacement invariant (Section \ref{['sec:local_mean_displacement_invariance']}) if $\phi_{x}(y) \approx \phi_{x'}\left(y - \mu(x) + \mu(x')\right)$ when $x$ is close to $x'$. Here, $\mu(z)$ denotes the mean (center of mass) of $\phi_z$.
  • Figure 1: Illustration of the influence of negative numbers in the integral kernel on the robustness of the ellipsoid estimates for the supports of impulse responses. Left two columns: Blur kernel given in Equation \ref{['eq:frog_kernel']}. Right two columns: Ricker wavelet-type kernel given by $\Phi(y,x) = (1 - a \gamma)\exp\left(-\gamma/2\right)$, where $\gamma=(y-x)^T \Sigma^{-1} (y-x)$, and $\Sigma=\operatorname{diag}\left(0.0025, 0.01\right)$. Ordered from top to bottom, the results are obtained with $a\in\{1.0, 20.0, 27.0\}$ for the left two columns, and $a\in \{0.0, 0.23, 0.249\}$ for the right two columns. Columns 1 and 3: impulse responses with estimated support ellipsoids indicated by the black ellipses. Red and blue represent positive and negative numbers in the integral kernel, respectively. Columns 2 and 4: one-dimensional slice along the horizontal line indicated in the two-dimensional plots. The dashed gray line is at zero.
  • Figure 1: (Ice sheet) (\ref{['fig:ice_mountain_mesh']}) Bird's eye view of the ice sheet discretized by a mesh of tetrahedra. Color indicates the height of the base of the ice sheet (i.e., the mountain topography). The radius of the domain is $10^4$ meters, the maximum height of the mountain is $2.1 \times 10^3$ meters, and the average thickness of the ice sheet is 250 meters. (\ref{['fig:true_beta']}) True parameter, ${q}_\text{true}$. (\ref{['fig:stokes_velocity']}) True velocity, $v_\text{true}$. Arrows indicate the direction of $v_\text{true}$ and color indicates the magnitude of $v_\text{true}$.
  • Figure 2: Left: Matrix created by evaluating the integral kernel $\Phi$ for $\mathcal{A}$ (Equation \ref{['eq:kernel_representation']}) at all pairs of mesh vertices. This illustration is for the integral kernel in Equation \ref{['eq:frog_kernel']}. Dark colors indicate large entries and light colors indicate small entries. Rows and columns are ordered according to a kd-tree hierarchical clustering. Right: Impulse responses associated with points $x_1,x_2\in \Omega$, shown by the two dotted vertical lines. Intuitively, one may think of impulse responses as "columns" of the integral kernel.
  • Figure 2: (Ice sheet) The log basal friction parameter, with color scale as in Figure \ref{['fig:true_beta']}, computed from the PDE constrained optimization problem with noise levels: $25\%$ (left), $5.0\%$ (middle), and $1.0\%$ (right).
  • ...and 9 more figures