Table of Contents
Fetching ...

Near-optimal hierarchical matrix approximation from matrix-vector products

Tyler Chen, Feyza Duman Keles, Diana Halikias, Cameron Musco, Christopher Musco, David Persson

TL;DR

This work describes a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an $n\times n$ matrix and is the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class.

Abstract

We describe a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an $n\times n$ matrix $\mathbf{A}$, accessible only though matrix-vector products with $\mathbf{A}$ and $\mathbf{A}^{\mathsf{T}}$. We prove that, for the rank-$k$ HODLR approximation problem, our method achieves a $(1+β)^{\log(n)}$-optimal approximation in expected Frobenius norm using $O(k\log(n)/β^3)$ matrix-vector products. In particular, the algorithm obtains a $(1+\varepsilon)$-optimal approximation with $O(k\log^4(n)/\varepsilon^3)$ matrix-vector products, and for any constant $c$, an $n^c$-optimal approximation with $O(k \log(n))$ matrix-vector products. Apart from matrix-vector products, the additional computational cost of our method is just $O(n \operatorname{poly}(\log(n), k, β))$. We complement the upper bound with a lower bound, which shows that any matrix-vector query algorithm requires at least $Ω(k\log(n) + k/\varepsilon)$ queries to obtain a $(1+\varepsilon)$-optimal approximation. Our algorithm can be viewed as a robust version of widely used "peeling" methods for recovering HODLR matrices and is, to the best of our knowledge, the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class. To control the propagation of error between levels of hierarchical approximation, we introduce a new perturbation bound for low-rank approximation, which shows that the widely used Generalized Nyström method enjoys inherent stability when implemented with noisy matrix-vector products. We also introduce a novel randomly perforated matrix sketching method to further control the error in the peeling algorithm.

Near-optimal hierarchical matrix approximation from matrix-vector products

TL;DR

This work describes a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an matrix and is the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class.

Abstract

We describe a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an matrix , accessible only though matrix-vector products with and . We prove that, for the rank- HODLR approximation problem, our method achieves a -optimal approximation in expected Frobenius norm using matrix-vector products. In particular, the algorithm obtains a -optimal approximation with matrix-vector products, and for any constant , an -optimal approximation with matrix-vector products. Apart from matrix-vector products, the additional computational cost of our method is just . We complement the upper bound with a lower bound, which shows that any matrix-vector query algorithm requires at least queries to obtain a -optimal approximation. Our algorithm can be viewed as a robust version of widely used "peeling" methods for recovering HODLR matrices and is, to the best of our knowledge, the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class. To control the propagation of error between levels of hierarchical approximation, we introduce a new perturbation bound for low-rank approximation, which shows that the widely used Generalized Nyström method enjoys inherent stability when implemented with noisy matrix-vector products. We also introduce a novel randomly perforated matrix sketching method to further control the error in the peeling algorithm.
Paper Structure (32 sections, 12 theorems, 70 equations, 8 figures, 1 table, 4 algorithms)

This paper contains 32 sections, 12 theorems, 70 equations, 8 figures, 1 table, 4 algorithms.

Key Result

Theorem 1.1

Fix a rank parameter $k$ and accuracy parameter $\beta$. Let $L =\lceil \log_2(n/k) \rceil$. There exists a non-adaptiveAn algorithm that computes matrix-vector products $\mathbf{B}_1\mathbf{x}_1, \ldots, \mathbf{B}_t\mathbf{x}_t$ where $\mathbf{B}_i = \mathbf{A}$ or $\mathbf{B}_i=\mathbf{A}^\mathsf

Figures (8)

  • Figure 1: State of the hierarchical matrix at the start of the $\ell=2$ level of peeling. $\mathbf A^{(2)}$ denotes the matrix $\mathbf A$ after subtracting the low-rank off-diagonal blocks recovered at the first level -- these blocks are zero in $\mathbf A^{(2)}$ and shown as white in the figure. For $j = 1, 3$, we can simultaneously obtain the products $\mathbf{A}^{(2)}_{j+1,j}\mathbf{\Omega}_j$ and the transpose-products $(\mathbf{A}^{(2)}_{j+1,j})^\mathsf{T}\mathbf{\Psi}_{j+1}$ from the sketches $\mathbf{A}\mathbf{\Omega}^+$ and $\mathbf{A}^\mathsf{T}\mathbf{\Psi}^-$. Letting $\mathbf \Omega^+$ and $\mathbf \Psi^-$ have $k$ columns and blocks chosen to be appropriate sketching matrices, these products can be used to exactly recover the rank-$k$ blocks $\mathbf{A}^{(2)}_{j+1,j}$ for $j = 1,3$. Obtaining products with and recovering $\mathbf{A}^{(2)}_{j-1,j}$ for $j = 2, 4$ can be done analogously using $\mathbf{\Omega}^{-}$ and $\mathbf{\Psi}^+$. Overall, we can recover all four rank-$k$ off diagonal blocks at level $\ell =2$ using just $4k$ queries to $\mathbf A$ ($k$ for each of $\mathbf \Omega^+, \mathbf \Omega^-, \mathbf \Psi^+,$ and $\mathbf \Psi^-$).
  • Figure 2: State of the hierarchical matrix at the start of the $\ell=3$ level of peeling. $\mathbf A^{(3)}$ denotes the matrix $\mathbf A$ after subtracting off-diagonal blocks that were (approximately) recovered at the first two levels. We would like to use the sketches $\mathbf{A}^{(3)}_{6,5}\mathbf{\Omega}_5$ and $(\mathbf{A}^{(3)}_{6,5})^\mathsf{T}\mathbf{\Psi}_{6}$ to obtain a low-rank approximation to the off-diagonal block $\mathbf{A}^{(3)}_{6,5}$. However, if the off-diagonal blocks at previous levels were not recovered exactly (since these blocks may not be exactly rank-$k$), then after subtraction, these blocks will not be exactly zero in $\mathbf A^{(3)}$. We thus obtain perturbed sketches of the form ${\color{Green}\mathbf{A}^{(3)}_{6,5}\mathbf{\Omega}_5} + {\color{Red}\mathbf{A}^{(3)}_{6,1}\mathbf{\Omega}_1}+{\color{Red}\mathbf{A}^{(3)}_{6,3}\mathbf{\Omega}_3}+{\color{Red}\mathbf{A}^{(3)}_{6,7}\mathbf{\Omega}_7}$ and ${\color{Green}(\mathbf{A}^{(3)}_{6,5})^\mathsf{T}\mathbf{\Psi}_6} + {\color{Red}(\mathbf{A}^{(3)}_{2,5})^\mathsf{T}\mathbf{\Psi}_2}+{\color{Red}(\mathbf{A}^{(3)}_{4,5})^\mathsf{T}\mathbf{\Psi}_4}+{\color{Red}(\mathbf{A}^{(3)}_{8,5})^\mathsf{T}\mathbf{\Psi}_8}$.
  • Figure 3: As in \ref{['fig:peeling', 'fig:peeling_err']}, we aim to obtain the sketches $\mathbf{A}^{(3)}_{j\pm 1,j}\mathbf{\Omega}^+_j$ and $(\mathbf{A}^{(3)}_{j\pm 1,j})^\mathsf{T}\mathbf{\Psi}^-_{j\pm1}$ from the sketches $\mathbf{A}^{(3)}\mathbf{\Omega}^+$ and $(\mathbf{A}^{(3)})^\mathsf{T}\mathbf{\Psi}^-$. The key observation is that by using randomly perforated Gaussian sketches, which decrease the number of nonzero blocks per block-column (while increasing the number of column. This results in less error. We illustrate the error for block $\mathbf A_{6,5}^{(3)}$. Note that we obtain sketches ${\color{Green}\mathbf{A}^{(3)}_{6,5}\mathbf{\Omega}_5} + {\color{Red}\mathbf{A}^{(3)}_{6,1}\mathbf{\Omega}_1}$ and ${\color{Green}(\mathbf{A}^{(3)}_{6,5})^\mathsf{T}\mathbf{\Psi}_6}$. Thus, the error is decreased compared to \ref{['fig:peeling_err']}.
  • Figure 4: Relative error of peeling algorithms from \ref{['tab:alg_params']} on discrete inverse Poisson matrix described in \ref{['eqn:poisson_def']} as a function of the the oversampling parameter $\beta$ for several choices of the rank-parameter $k$. Legend: GN1 (), GN2 (), RSVD1 (), RSVD2 (). Takeaway: Both Generalized Nyström Method-based variants seem to perform similarly, and produce an increasingly accurate output as $1/\beta$ increases. On the other hand, the standard randomized SVD-based variant, RSVD1, stagnates. The RSVD2 variant, which uses randomly perforated sketches, converges.
  • Figure 5: Absolute error of peeling algorithms from \ref{['tab:alg_params']} on discrete inverse Poisson matrix described in \ref{['eqn:poisson_def']} as a function of the oversampling parameter $\beta$ with and without truncation to rank-$k$ for several choices of the rank-parameter $k$. Legend: GN1 (), GN2 (), RSVD1 (), RSVD2 (). Takeaway: Without truncation, all of the algorithms can perform significantly better. However, the resulting approximations are not $\operatorname{HODLR}(k)$, and are therefore more expensive to store and work with.
  • ...and 3 more figures

Theorems & Definitions (19)

  • Definition 1.1
  • Theorem 1.1
  • Theorem 1.2
  • Theorem 3.1
  • Theorem 3.2
  • Definition 4.1
  • Definition 4.2
  • Definition 4.3
  • Definition 4.4
  • Definition 4.5
  • ...and 9 more