Polynomial Preconditioning for the Action of the Matrix Square Root and Inverse Square Root
Andreas Frommer, Gustavo Ramirez-Hidalgo, Marcel Schweitzer, Manuel Tsolakis
TL;DR
This work develops polynomial preconditioning to accelerate the action of matrix square roots and inverse square roots within Krylov subspace methods, enabling smaller effective subspaces and reduced storage. By constructing $p(A)=(q(A))^2$ where $q$ approximates $z^{-1/2}$, the preconditioned operator $A(q(A))^2$ becomes better conditioned and accelerates convergence in both Hermitian and non-Hermitian settings. The authors propose multiple polynomial construction strategies—including Chebyshev expansions, Ritz-value interpolation, and contour-based error minimization—and provide spectral conditioning analyses with extensive numerical demonstrations on Poisson, lattice QCD, and graph Laplacian problems. The results indicate substantial speedups and storage savings over restarting or sketching approaches, highlighting the practical impact for large-scale scientific computing and data-analysis tasks involving matrix square roots and inverse square roots. The framework also clarifies branch considerations for the square root and outlines pathways to extend polynomial preconditioning to broader matrix-function computations.
Abstract
While preconditioning is a long-standing concept to accelerate iterative methods for linear systems, generalizations to matrix functions are still in their infancy. We go a further step in this direction, introducing polynomial preconditioning for Krylov subspace methods which approximate the action of the matrix square root and inverse square root on a vector. Preconditioning reduces the subspace size and therefore avoids the storage problem together with -- for non-Hermitian matrices -- the increased computational cost per iteration that arises in the unpreconditioned case. Polynomial preconditioning is an attractive alternative to current restarting or sketching approaches since it is simpler and computationally more efficient. We demonstrate this for several numerical examples.
