Nearly Optimal Approximation of Matrix Functions by the Lanczos Method

Noah Amsel; Tyler Chen; Anne Greenbaum; Cameron Musco; Chris Musco

Nearly Optimal Approximation of Matrix Functions by the Lanczos Method

Noah Amsel, Tyler Chen, Anne Greenbaum, Cameron Musco, Chris Musco

TL;DR

The Lanczos method for matrix functions, called Lanczos-FA, matches the error of the best possible Krylov subspace method up to a multiplicative approximation factor, providing a strong justification for the excellent performance of Lanczos-FA, especially on functions that are well approximated by rationals.

Abstract

Approximating the action of a matrix function $f(\mathbf{A})$ on a vector $\mathbf{b}$ is an increasingly important primitive in machine learning, data science, and statistics, with applications such as sampling high dimensional Gaussians, Gaussian process regression and Bayesian inference, principle component analysis, and approximating Hessian spectral densities. Over the past decade, a number of algorithms enjoying strong theoretical guarantees have been proposed for this task. Many of the most successful belong to a family of algorithms called Krylov subspace methods. Remarkably, a classic Krylov subspace method, called the Lanczos method for matrix functions (Lanczos-FA), frequently outperforms newer methods in practice. Our main result is a theoretical justification for this finding: we show that, for a natural class of rational functions, Lanczos-FA matches the error of the best possible Krylov subspace method up to a multiplicative approximation factor. The approximation factor depends on the degree of $f(x)$'s denominator and the condition number of $\mathbf{A}$, but not on the number of iterations $k$. Our result provides a strong justification for the excellent performance of Lanczos-FA, especially on functions that are well approximated by rationals, such as the matrix square root.

Nearly Optimal Approximation of Matrix Functions by the Lanczos Method

TL;DR

Abstract

Approximating the action of a matrix function

on a vector

is an increasingly important primitive in machine learning, data science, and statistics, with applications such as sampling high dimensional Gaussians, Gaussian process regression and Bayesian inference, principle component analysis, and approximating Hessian spectral densities. Over the past decade, a number of algorithms enjoying strong theoretical guarantees have been proposed for this task. Many of the most successful belong to a family of algorithms called Krylov subspace methods. Remarkably, a classic Krylov subspace method, called the Lanczos method for matrix functions (Lanczos-FA), frequently outperforms newer methods in practice. Our main result is a theoretical justification for this finding: we show that, for a natural class of rational functions, Lanczos-FA matches the error of the best possible Krylov subspace method up to a multiplicative approximation factor. The approximation factor depends on the degree of

's denominator and the condition number of

, but not on the number of iterations

. Our result provides a strong justification for the excellent performance of Lanczos-FA, especially on functions that are well approximated by rationals, such as the matrix square root.

Paper Structure (34 sections, 12 theorems, 81 equations, 8 figures)

This paper contains 34 sections, 12 theorems, 81 equations, 8 figures.

Introduction
Krylov subspace methods
Optimality guarantees for Krylov subspace methods
Existing near-optimality analyses of Lanczos-FA
Our contributions
Near optimality for rational functions
Proof Sketch
Implications for non-rational functions
Near Spectrum Optimality for A±1/2b
Experiments
Dependence on the rational function degree
Non-rational functions
Additional Experiments.
Outlook
Extension to other function classes.
...and 19 more sections

Key Result

Theorem 4

Let $r(x) = {n(x)}/{m(x)}$ be a degree $(p,q)$-rational function as in eqn:rat_form and define $\mathbf{A}_j$ as in eq:aj. Then, if $k > \max \{p, q-1\}$, the Lanczos-FA iterate satisfies the bound

Figures (8)

Figure 1: Lanczos-FA error $\| f(\mathbf{A}) - \mathsf{lan}_{k}(f;\mathbf{A},\mathbf{b}) \|_2$ at each iteration for several functions/spectra. "Instance Optimal" is the right hand side of \ref{['def:instance_opt']} with $C=c=1$, which is a lower bound for all KSMs, including Lanczos-FA. Lanczos-FA performs nearly instance optimally on a wide range of problems, far better than \ref{['fact:lan_is_FOV']} predicts. This is easily seen in the bottom plots, which show the ratio of the error of the Lanczos-FA iterate and the Krylov optimal iterate, $\mathsf{opt}_{k}(f;\mathbf{A},\mathbf{b})$.
Figure 2: Despite its large prefactor, the bound of \ref{['thm:main']} qualitatively captures the convergence behavior of Lanczos-FA for rational functions. It can be tighter than the standard bound of \ref{['fact:lan_is_FOV']}, even for a moderate number of iterations $k$. We use rational approximations to $\exp(-x/10)$ and $\log(x)$ for comparison with \ref{['fig:intro_motivating']}; see \ref{['sec:experiments']} for more details.
Figure 3: The maximum observed ratio between the error of Lanczos-FA and the optimal error over choices of $\mathbf{b}$ when approximating $\mathbf{A}^{-q}$ for matrices with varying condition number $\kappa$. Each point corresponds to a pair $(\kappa, q)$. Points with the same color have the same value of $\kappa$. On the left, the dotted line plots $\sqrt{q \kappa}$ for the maximum $\kappa$ considered ($10^6$). On the right, the dotted line plots $\sqrt{q \kappa}$ for the maximum $q$ considered ($2^6$). Overall, the optimality ratio appears to scale at least as $\Omega(\sqrt{q \kappa})$.
Figure 4: Applying Lanczos-FA to the function $\mathbf A^{-0.4}$ and rational approximations of various degrees found using the BRASIL algorithm clemens_21. In this experiment, the spectrum of $\mathbf{A}$ contains two clusters: 10 eigenvalues uniformly spaced near 1, and 90 eigenvalues uniformly spaced near 100. As predicted by the bound in \ref{['sec:triangle_inequality']}, convergence of Lanczos-FA for this function appears to closely track that of a high degree rational approximant.
Figure 5: A comparison of Lanczos-FA with two methods from jin_sidford_19 ("rational" and "slanczos") for computing the matrix sign function, which work by using a stochastic iterative method to approximate rational approximations to the step function of various degrees. The "rational" method is the main one studied in jin_sidford_19, while "slanczos" is included because it is the best performing in their experiments. Each panel corresponds to one of the test problems from jin_sidford_19. Iterations of these methods are counted in number of inner products with rows of $\mathbf A$ rather than number of matrix-vector products with $\mathbf A$ as a whole. To compare these with Lanczos-FA, we consider $d$ such inner products to be equivalent to one matrix-vector product.
...and 3 more figures

Theorems & Definitions (26)

Definition 1
Definition 2: Near Instance Optimality
Theorem 4
Lemma 5
Theorem 6
Theorem 7
Lemma 8: druskin_knizhnerman_89saad_92
proof
Lemma 9
proof
...and 16 more

Nearly Optimal Approximation of Matrix Functions by the Lanczos Method

TL;DR

Abstract

Nearly Optimal Approximation of Matrix Functions by the Lanczos Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (26)