Table of Contents
Fetching ...

Near instance optimality of the Lanczos method for Stieltjes and related matrix functions

Marcel Schweitzer

TL;DR

This paper establishes a near instance optimality guarantee for the Lanczos method when computing $f(A)\mathbf{b}$ with $A$ Hermitian positive definite and $f$ in the Stieltjes class. By representing $f(A)\mathbf{b}$ as an integral of shifted inverses and applying a Woodbury-based low-rank update to the Lanczos blocks, it shows that the Lanczos error is within a constant factor of the best Krylov approximation, with an explicit bound depending on $\kappa(A)$ and the Lanczos coefficient $\beta_{m+1}$. The result extends to a related class of functions $f(z)=z g(z)$, where $g$ is Stieltjes, and is supported by numerical experiments that demonstrate sharpness and practical accuracy, often outperforming existing bounds. Consequently, one can analyze Lanczos performance for these functions via polynomial approximation on the eigenvalue spectrum, providing a more problem-dependent understanding of convergence and guiding expectations in applications such as fractional differential equations, network analysis, and Gaussian processes.

Abstract

Polynomial Krylov subspace methods are among the most widely used methods for approximating $f(A)b$, the action of a matrix function on a vector, in particular when $A$ is large and sparse. When $A$ is Hermitian positive definite, the Lanczos method is the standard choice of Krylov method, and despite being very simplistic in nature, it often outperforms other, more sophisticated methods. In fact, one often observes that the error of the Lanczos method behaves almost exactly as the error of the best possible approximation from the Krylov space (which is in general not efficiently computable). However, theoretical guarantees for the deviation of the Lanczos error from the optimal error are mostly lacking so far (except for linear systems and a few other special cases). We prove a rigorous bound for this deviation when $f$ belongs to the important class of Stieltjes functions (which, e.g., includes inverse fractional powers as special cases) and a related class (which contains, e.g., the square root and the shifted logarithm), thus providing a \emph{near instance optimality} guarantee. While the constants in our bounds are likely not optimal, they greatly improve over the few results that are available in the literature and resemble the actual behavior much better.

Near instance optimality of the Lanczos method for Stieltjes and related matrix functions

TL;DR

This paper establishes a near instance optimality guarantee for the Lanczos method when computing with Hermitian positive definite and in the Stieltjes class. By representing as an integral of shifted inverses and applying a Woodbury-based low-rank update to the Lanczos blocks, it shows that the Lanczos error is within a constant factor of the best Krylov approximation, with an explicit bound depending on and the Lanczos coefficient . The result extends to a related class of functions , where is Stieltjes, and is supported by numerical experiments that demonstrate sharpness and practical accuracy, often outperforming existing bounds. Consequently, one can analyze Lanczos performance for these functions via polynomial approximation on the eigenvalue spectrum, providing a more problem-dependent understanding of convergence and guiding expectations in applications such as fractional differential equations, network analysis, and Gaussian processes.

Abstract

Polynomial Krylov subspace methods are among the most widely used methods for approximating , the action of a matrix function on a vector, in particular when is large and sparse. When is Hermitian positive definite, the Lanczos method is the standard choice of Krylov method, and despite being very simplistic in nature, it often outperforms other, more sophisticated methods. In fact, one often observes that the error of the Lanczos method behaves almost exactly as the error of the best possible approximation from the Krylov space (which is in general not efficiently computable). However, theoretical guarantees for the deviation of the Lanczos error from the optimal error are mostly lacking so far (except for linear systems and a few other special cases). We prove a rigorous bound for this deviation when belongs to the important class of Stieltjes functions (which, e.g., includes inverse fractional powers as special cases) and a related class (which contains, e.g., the square root and the shifted logarithm), thus providing a \emph{near instance optimality} guarantee. While the constants in our bounds are likely not optimal, they greatly improve over the few results that are available in the literature and resemble the actual behavior much better.

Paper Structure

This paper contains 8 sections, 8 theorems, 68 equations, 5 figures, 1 algorithm.

Key Result

Theorem 3.1

\newlabelthm:near_instance_optimality_stieltjes0 Let $A \in \mathbb{C}^{n \times n}$ be Hermitian positive definite with smallest and largest eigenvalue $\lambda_{\min}$ and $\lambda_{\max}$, respectively, let ${\mathbf b} \in \mathbb{C}^n$ with $\|{\mathbf b}\| = 1$ and let $f$ be a Stieltjes fun

Figures (5)

  • Figure 1: Sharpness of the estimates from \ref{['thm:near_instance_optimality_stieltjes']} as well as of certain inequalities from its proof for the matrices $A_1$ (left) and $A_2$ (right) defined in the text of \ref{['ex:sharpness']}. The vector ${\mathbf b}$ has normally distributed random entries and $f$ is the inverse square root (top row) or square root (bottom row).
  • Figure 2: Comparison of the norms of the two terms $\|f_1(T_m){\mathbf e}_m\|$ and $\|f_2(S_{M-m}){\mathbf e}_1\|$ contributing to the Lanczos error for the matrices $A_1$ (left) and $A_2$ (right) defined in the text of \ref{['ex:sharpness']}. The vector ${\mathbf b}$ has normally distributed random entries and $f$ is the inverse square root. The bottom panel shows the ratio between the two terms.
  • Figure 3: Effective bound from \ref{['thm:near_instance_optimality_stieltjes']} for the matrices $A_1$ (left) and $A_2$ (right) defined in the text of \ref{['ex:sharpness']}. The function $f$ is the inverse square root and the vector ${\mathbf b}$ has normally distributed contribution from the eigenvectors ${\mathbf w}_{26},\dots,{\mathbf w}_{75}$, while the other eigenvectors have zero contribution. Thus, in \ref{['eq:mainresult1']}, we replace $\lambda_{\min}$ by $\lambda_{26}$ and $\lambda_{\max}$ by $\lambda_{75}$.
  • Figure 4: Comparison of the near instance optimality guarantee from \ref{['thm:near_instance_optimality_stieltjes']} to the near spectrum optimality guarantees \ref{['eq:near_spectrum_optimality_invsqrt']} and \ref{['eq:near_spectrum_optimality_sqrt']} from AmselChenGreenbaumMuscoMusco2023 as well as the near FOV optimality guarantee \ref{['eq:near_fov_optimality']} for the matrices $A_1$ (left) and $A_2$ (right) defined in the text of \ref{['ex:sharpness']}. The vector ${\mathbf b}$ has normally distributed random entries and $f$ is the inverse square root (top row) or square root (bottom row).
  • Figure 5: Comparison of the near instance optimality guarantee from \ref{['thm:near_instance_optimality_stieltjes']} to the near instance optimality guarantee \ref{['eq:bound_amsel_etal']} from AmselChenGreenbaumMuscoMusco2023 as well as the near FOV optimality guarantee \ref{['eq:near_fov_optimality']} for the matrices $A_3$ (left) and $A_4$ (right) defined in the text of \ref{['ex:comparison_amsel_log']}. The vector ${\mathbf b}$ has normally distributed random entries and $f$ is the logarithm (or a degree-10 rational approximation of the logarithm for using \ref{['eq:bound_amsel_etal']}).

Theorems & Definitions (22)

  • Remark 2.1
  • Theorem 3.1
  • Proposition 3.2
  • Proof 1
  • Lemma 3.3
  • Proof 2
  • Lemma 3.4
  • Proof 3
  • Proof 4: Proof of \ref{['thm:near_instance_optimality_stieltjes']}
  • Remark 3.5
  • ...and 12 more