Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms
Yi Li, Honghao Lin, David P. Woodruff
TL;DR
The paper addresses fast, memory-efficient residual error estimation for matrix and vector norms using oblivious bilinear sketches. It proves a tight $Θ(k^2/ε^4)$ bound on the sketch size for the matrix residual problem and provides matching lower bounds, while enabling sparse, fast-to-update constructions via PCP-based methods. It also extends the framework to $F_p$ residuals with $p>2$ by giving near-optimal space for sparse recovery, including a matching lower bound, and demonstrates practical gains through experiments that show substantial speedups over prior approaches. These results yield a practical toolkit for deciding when to invest in expensive low-rank or sparse recovery computations in large-scale or streaming settings. Overall, the work advances the theory and practice of residual-error-aware sketching for both matrix and vector norms with strong theoretical guarantees and empirical validation.
Abstract
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. Such estimates can be used, for example, to quickly assess how useful a more expensive low-rank approximation computation will be. The matrix case concerns the Frobenius norm and the task is to approximate the $k$-residual $\|A - A_k\|_F$ of the input matrix $A$ within a $(1+ε)$-factor, where $A_k$ is the optimal rank-$k$ approximation. We provide a tight bound of $Θ(k^2/ε^4)$ on the size of bilinear sketches, which have the form of a matrix product $SAT$. This improves the previous $O(k^2/ε^6)$ upper bound in (Andoni et al. SODA 2013) and gives the first non-trivial lower bound, to the best of our knowledge. In our algorithm, our sketching matrices $S$ and $T$ can both be sparse matrices, allowing for a very fast update time. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. For the vector case, we consider the $\ell_p$-norm for $p>2$, where the task is to approximate the $k$-residual $\|x - x_k\|_p$ up to a constant factor, where $x_k$ is the optimal $k$-sparse approximation to $x$. Such vector norms are frequently studied in the data stream literature and are useful for finding frequent items or so-called heavy hitters. We establish an upper bound of $O(k^{2/p}n^{1-2/p}\operatorname{poly}(\log n))$ for constant $ε$ on the dimension of a linear sketch for this problem. Our algorithm can be extended to the $\ell_p$ sparse recovery problem with the same sketching dimension, which seems to be the first such bound for $p > 2$. We also show an $Ω(k^{2/p}n^{1-2/p})$ lower bound for the sparse recovery problem, which is tight up to a $\mathrm{poly}(\log n)$ factor.
