Table of Contents
Fetching ...

Polynomial Approximation to the Inverse of a Large Matrix

Mark Embree, Joel A. Henningsen, Jordan Jackson, Ronald B. Morgan

TL;DR

This work investigates approximating the inverse of a large matrix by a polynomial in the matrix, $p(A)$, derived from the GMRES residual polynomial. It establishes theoretical bounds tying the quality of $p(A)$ to GMRES residuals, and develops stability techniques (extra root copies) and deflation to handle challenging spectra. The paper compares multiple construction strategies, including full GMRES, polynomial preconditioning (double polynomials), restarted GMRES, and nonsymmetric Lanczos, highlighting practical tradeoffs for solving linear systems with multiple right-hand sides and for variance reduction in Monte Carlo methods. Overall, polynomial inverses offer a scalable, cache-friendly alternative to direct factorization, with strong potential when combined with deflation and stability controls, though challenges remain for highly ill-conditioned or widely gapped spectra.

Abstract

The inverse of a large matrix can often be accurately approximated by a polynomial of degree significantly lower than the order of the matrix. The iteration polynomial generated by a run of the GMRES algorithm is a good candidate, and its approximation to the inverse often seems to track the accuracy of the GMRES iteration. We investigate the quality of this approximation through theory and experiment, noting the practical need to add copies of some polynomial terms to improve stability. To mitigate storage and orthogonalization costs, other approaches have appeal, such as polynomial preconditioned GMRES and deflation of problematic eigenvalues. Applications of such polynomial approximations include solving systems of linear equations with multiple right-hand sides (where the solutions to subsequent problems come simply by multiplying the polynomial against the new right-hand sides) and variance reduction in multilevel Monte Carlo methods.

Polynomial Approximation to the Inverse of a Large Matrix

TL;DR

This work investigates approximating the inverse of a large matrix by a polynomial in the matrix, , derived from the GMRES residual polynomial. It establishes theoretical bounds tying the quality of to GMRES residuals, and develops stability techniques (extra root copies) and deflation to handle challenging spectra. The paper compares multiple construction strategies, including full GMRES, polynomial preconditioning (double polynomials), restarted GMRES, and nonsymmetric Lanczos, highlighting practical tradeoffs for solving linear systems with multiple right-hand sides and for variance reduction in Monte Carlo methods. Overall, polynomial inverses offer a scalable, cache-friendly alternative to direct factorization, with strong potential when combined with deflation and stability controls, though challenges remain for highly ill-conditioned or widely gapped spectra.

Abstract

The inverse of a large matrix can often be accurately approximated by a polynomial of degree significantly lower than the order of the matrix. The iteration polynomial generated by a run of the GMRES algorithm is a good candidate, and its approximation to the inverse often seems to track the accuracy of the GMRES iteration. We investigate the quality of this approximation through theory and experiment, noting the practical need to add copies of some polynomial terms to improve stability. To mitigate storage and orthogonalization costs, other approaches have appeal, such as polynomial preconditioned GMRES and deflation of problematic eigenvalues. Applications of such polynomial approximations include solving systems of linear equations with multiple right-hand sides (where the solutions to subsequent problems come simply by multiplying the polynomial against the new right-hand sides) and variance reduction in multilevel Monte Carlo methods.

Paper Structure

This paper contains 19 sections, 5 theorems, 27 equations, 13 figures, 7 tables, 4 algorithms.

Key Result

Theorem 1

\newlabelthm:general0 Suppose $A\in\mathbbm{C}^{n\times n}$ is invertible and $Ax=b$ for some unit vector $b\in\mathbbm{C}^n$. Let $\widehat{x} = p(A) b$ be an approximation to $x$ for some polynomial $p$, and let $\kappa(A) = \|A\| \|A^{-1}\|$. Then

Figures (13)

  • Figure 1: \newlabelfig:cartoon0 Adding an extra copy of a root can stabilize $\pi$ and $p$. This $A$ has real eigenvalues in $[0.1,1]$ and one outlying eigenvalue at $\lambda=2$. Left: the degree $k=5$ GMRES residual polynomial $\pi(z)$ (red) with roots at the black dots; note the steep slope at the root $\theta = 1.9869$ near the eigenvalue $\lambda=2$. Adding an extra copy of that root leads to the degree $k+1=6$ polynomial (blue) that is small in a larger neighborhood of the root. Right: the corresponding approximate inverse polynomial $p(z)$, which interpolates $1/z$ (gray line) at the black dots. The extra root has a similarly tonic effect on this polynomial, which now also interpolates the derivative of $1/\lambda$ at the extra root.
  • Figure 2: \newlabelfig:polyacc10 Example 1: The matrix is size $n = 2500$ from the convection-diffusion equation $- u_{xx} - u_{yy} + 2 u_{x} = f$. The relative accuracy, $\|A^{-1} - p(A)\| / \|A^{-1}\|$, of the polynomial approximation to the inverse is shown every 10 GMRES iterations, and is compared to the GMRES residual norm.
  • Figure 3: Convection-diffusion example from Figure \ref{['fig:polyacc1']} (dimension $n=2500$), showing how the polynomial $p(z)$ approximates $1/z$ at the eigenvalues of $A$: the gray line shows $1/z$; the markers show $p(\lambda)$ for the eigenvalues $\lambda$ of $A$, for three different degree polynomials $p$ of degree $k-1$.
  • Figure 4: Matrix of size $n = 2500$ from the convection-diffusion equation $- u_{xx} - u_{yy} + \alpha u_{x} + \beta u_{y} - \gamma^2 u = f$. The relative accuracy of $p(A)$ is compared to the GMRES residual for variying degrees of nonnormality and indefiniteness. (The last few red dots in the bottom plots likely reflect some numerical instabilities.)
  • Figure 5: \newlabelfig:hritz_plots0 The top plots show the harmonic Ritz values from 100 trials involving random Krylov subspaces of dimension $k=5$ for Hermitian matrices of dimension $n=8$, sorted $\theta_1 \le \theta_2 \le \cdots \le \theta_5$. Eigenvalues are marked by gray vertical lines; the origin is denoted by the vertical black dashed line. The positive definite matrix on the left has eigenvalues $\{1,2,3,4,7,8,9,10\}$, and the harmonic Ritz values obey Cauchy interlacing. The indefinite problem on the right has the same eigenvalues, only shifted left: $\{-4,-3,-2,-1, 2,3,4,5\}$. Notice the absence of harmonic Ritz values in the interval $(-1,2)$; numerous $\theta_j$ values are beyond the axis limits for this case. The two plots would be identical (up to the shift) if they showed standard Ritz values, which are shift invariant. The bottom plots show how $p(z)$ (red line) interpolates $1/z$ (gray line) for one of the trials. Notice that approximating $1/z$ at eigenvalues on both sides of the origin (indefinite problem on the right) is much more challenging.
  • ...and 8 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Proof 1
  • Theorem 2
  • Proof 2
  • Proposition 3
  • Theorem 4
  • Theorem 5
  • Proof 3