Near-optimal hierarchical matrix approximation from matrix-vector products

Tyler Chen; Feyza Duman Keles; Diana Halikias; Cameron Musco; Christopher Musco; David Persson

Near-optimal hierarchical matrix approximation from matrix-vector products

Tyler Chen, Feyza Duman Keles, Diana Halikias, Cameron Musco, Christopher Musco, David Persson

TL;DR

This work describes a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an $n\times n$ matrix and is the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class.

Abstract

We describe a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an $n\times n$ matrix $\mathbf{A}$, accessible only though matrix-vector products with $\mathbf{A}$ and $\mathbf{A}^{\mathsf{T}}$. We prove that, for the rank-$k$ HODLR approximation problem, our method achieves a $(1+β)^{\log(n)}$-optimal approximation in expected Frobenius norm using $O(k\log(n)/β^3)$ matrix-vector products. In particular, the algorithm obtains a $(1+\varepsilon)$-optimal approximation with $O(k\log^4(n)/\varepsilon^3)$ matrix-vector products, and for any constant $c$, an $n^c$-optimal approximation with $O(k \log(n))$ matrix-vector products. Apart from matrix-vector products, the additional computational cost of our method is just $O(n \operatorname{poly}(\log(n), k, β))$. We complement the upper bound with a lower bound, which shows that any matrix-vector query algorithm requires at least $Ω(k\log(n) + k/\varepsilon)$ queries to obtain a $(1+\varepsilon)$-optimal approximation. Our algorithm can be viewed as a robust version of widely used "peeling" methods for recovering HODLR matrices and is, to the best of our knowledge, the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class. To control the propagation of error between levels of hierarchical approximation, we introduce a new perturbation bound for low-rank approximation, which shows that the widely used Generalized Nyström method enjoys inherent stability when implemented with noisy matrix-vector products. We also introduce a novel randomly perforated matrix sketching method to further control the error in the peeling algorithm.

Near-optimal hierarchical matrix approximation from matrix-vector products

TL;DR

This work describes a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an

matrix and is the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class.

Abstract

We describe a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an

matrix

, accessible only though matrix-vector products with

and

. We prove that, for the rank-

HODLR approximation problem, our method achieves a

-optimal approximation in expected Frobenius norm using

matrix-vector products. In particular, the algorithm obtains a

-optimal approximation with

matrix-vector products, and for any constant

, an

-optimal approximation with

matrix-vector products. Apart from matrix-vector products, the additional computational cost of our method is just

. We complement the upper bound with a lower bound, which shows that any matrix-vector query algorithm requires at least

queries to obtain a

-optimal approximation. Our algorithm can be viewed as a robust version of widely used "peeling" methods for recovering HODLR matrices and is, to the best of our knowledge, the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class. To control the propagation of error between levels of hierarchical approximation, we introduce a new perturbation bound for low-rank approximation, which shows that the widely used Generalized Nyström method enjoys inherent stability when implemented with noisy matrix-vector products. We also introduce a novel randomly perforated matrix sketching method to further control the error in the peeling algorithm.

Paper Structure (32 sections, 12 theorems, 70 equations, 8 figures, 1 table, 4 algorithms)

This paper contains 32 sections, 12 theorems, 70 equations, 8 figures, 1 table, 4 algorithms.

Introduction
Formal problem setup
Main results
The Peeling Algorithm
Low-rank approximation
Exact HODLR recovery
Peeling in the presence of error
Techniques
A perturbation bound for low-rank approximation
Low-rank approximation in the peeling algorithm
Randomly perforated Gaussian sketching
The Generalized Nyström Method Peeling Algorithm
Notation for sketching distributions
Notation for iteration and pseudocode
Range approximation
...and 17 more sections

Key Result

Theorem 1.1

Fix a rank parameter $k$ and accuracy parameter $\beta$. Let $L =\lceil \log_2(n/k) \rceil$. There exists a non-adaptiveAn algorithm that computes matrix-vector products $\mathbf{B}_1\mathbf{x}_1, \ldots, \mathbf{B}_t\mathbf{x}_t$ where $\mathbf{B}_i = \mathbf{A}$ or $\mathbf{B}_i=\mathbf{A}^\mathsf

Figures (8)

Figure 1: State of the hierarchical matrix at the start of the $\ell=2$ level of peeling. $\mathbf A^{(2)}$ denotes the matrix $\mathbf A$ after subtracting the low-rank off-diagonal blocks recovered at the first level -- these blocks are zero in $\mathbf A^{(2)}$ and shown as white in the figure. For $j = 1, 3$, we can simultaneously obtain the products $\mathbf{A}^{(2)}_{j+1,j}\mathbf{\Omega}_j$ and the transpose-products $(\mathbf{A}^{(2)}_{j+1,j})^\mathsf{T}\mathbf{\Psi}_{j+1}$ from the sketches $\mathbf{A}\mathbf{\Omega}^+$ and $\mathbf{A}^\mathsf{T}\mathbf{\Psi}^-$. Letting $\mathbf \Omega^+$ and $\mathbf \Psi^-$ have $k$ columns and blocks chosen to be appropriate sketching matrices, these products can be used to exactly recover the rank-$k$ blocks $\mathbf{A}^{(2)}_{j+1,j}$ for $j = 1,3$. Obtaining products with and recovering $\mathbf{A}^{(2)}_{j-1,j}$ for $j = 2, 4$ can be done analogously using $\mathbf{\Omega}^{-}$ and $\mathbf{\Psi}^+$. Overall, we can recover all four rank-$k$ off diagonal blocks at level $\ell =2$ using just $4k$ queries to $\mathbf A$ ($k$ for each of $\mathbf \Omega^+, \mathbf \Omega^-, \mathbf \Psi^+,$ and $\mathbf \Psi^-$).
Figure 2: State of the hierarchical matrix at the start of the $\ell=3$ level of peeling. $\mathbf A^{(3)}$ denotes the matrix $\mathbf A$ after subtracting off-diagonal blocks that were (approximately) recovered at the first two levels. We would like to use the sketches $\mathbf{A}^{(3)}_{6,5}\mathbf{\Omega}_5$ and $(\mathbf{A}^{(3)}_{6,5})^\mathsf{T}\mathbf{\Psi}_{6}$ to obtain a low-rank approximation to the off-diagonal block $\mathbf{A}^{(3)}_{6,5}$. However, if the off-diagonal blocks at previous levels were not recovered exactly (since these blocks may not be exactly rank-$k$), then after subtraction, these blocks will not be exactly zero in $\mathbf A^{(3)}$. We thus obtain perturbed sketches of the form ${\color{Green}\mathbf{A}^{(3)}_{6,5}\mathbf{\Omega}_5} + {\color{Red}\mathbf{A}^{(3)}_{6,1}\mathbf{\Omega}_1}+{\color{Red}\mathbf{A}^{(3)}_{6,3}\mathbf{\Omega}_3}+{\color{Red}\mathbf{A}^{(3)}_{6,7}\mathbf{\Omega}_7}$ and ${\color{Green}(\mathbf{A}^{(3)}_{6,5})^\mathsf{T}\mathbf{\Psi}_6} + {\color{Red}(\mathbf{A}^{(3)}_{2,5})^\mathsf{T}\mathbf{\Psi}_2}+{\color{Red}(\mathbf{A}^{(3)}_{4,5})^\mathsf{T}\mathbf{\Psi}_4}+{\color{Red}(\mathbf{A}^{(3)}_{8,5})^\mathsf{T}\mathbf{\Psi}_8}$.
Figure 3: As in \ref{['fig:peeling', 'fig:peeling_err']}, we aim to obtain the sketches $\mathbf{A}^{(3)}_{j\pm 1,j}\mathbf{\Omega}^+_j$ and $(\mathbf{A}^{(3)}_{j\pm 1,j})^\mathsf{T}\mathbf{\Psi}^-_{j\pm1}$ from the sketches $\mathbf{A}^{(3)}\mathbf{\Omega}^+$ and $(\mathbf{A}^{(3)})^\mathsf{T}\mathbf{\Psi}^-$. The key observation is that by using randomly perforated Gaussian sketches, which decrease the number of nonzero blocks per block-column (while increasing the number of column. This results in less error. We illustrate the error for block $\mathbf A_{6,5}^{(3)}$. Note that we obtain sketches ${\color{Green}\mathbf{A}^{(3)}_{6,5}\mathbf{\Omega}_5} + {\color{Red}\mathbf{A}^{(3)}_{6,1}\mathbf{\Omega}_1}$ and ${\color{Green}(\mathbf{A}^{(3)}_{6,5})^\mathsf{T}\mathbf{\Psi}_6}$. Thus, the error is decreased compared to \ref{['fig:peeling_err']}.
Figure 4: Relative error of peeling algorithms from \ref{['tab:alg_params']} on discrete inverse Poisson matrix described in \ref{['eqn:poisson_def']} as a function of the the oversampling parameter $\beta$ for several choices of the rank-parameter $k$. Legend: GN1 (), GN2 (), RSVD1 (), RSVD2 (). Takeaway: Both Generalized Nyström Method-based variants seem to perform similarly, and produce an increasingly accurate output as $1/\beta$ increases. On the other hand, the standard randomized SVD-based variant, RSVD1, stagnates. The RSVD2 variant, which uses randomly perforated sketches, converges.
Figure 5: Absolute error of peeling algorithms from \ref{['tab:alg_params']} on discrete inverse Poisson matrix described in \ref{['eqn:poisson_def']} as a function of the oversampling parameter $\beta$ with and without truncation to rank-$k$ for several choices of the rank-parameter $k$. Legend: GN1 (), GN2 (), RSVD1 (), RSVD2 (). Takeaway: Without truncation, all of the algorithms can perform significantly better. However, the resulting approximations are not $\operatorname{HODLR}(k)$, and are therefore more expensive to store and work with.
...and 3 more figures

Theorems & Definitions (19)

Definition 1.1
Theorem 1.1
Theorem 1.2
Theorem 3.1
Theorem 3.2
Definition 4.1
Definition 4.2
Definition 4.3
Definition 4.4
Definition 4.5
...and 9 more

Near-optimal hierarchical matrix approximation from matrix-vector products

TL;DR

Abstract

Near-optimal hierarchical matrix approximation from matrix-vector products

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (19)