A sublinear-time randomized algorithm for column and row subset selection based on strong rank-revealing QR factorizations
Alice Cortinovis, Lexing Ying
TL;DR
This work develops a sublinear-time randomized sRRQR framework for selecting a small subset of rows and columns to form CUR-type low-rank approximations. It combines uniform sampling with a strong rank-revealing QR refinement and proves high-probability error bounds when the target matrix has a low-rank structure, possibly perturbed by $E$, with incoherence and sparsity in its factors $X$ and $Y$. The theory delivers exact recovery guarantees in the noiseless rank-$k$ case and explicit error bounds for numerically low-rank matrices, plus an iterative variant that further refines the index sets. The results establish practical, scalable column/row subset selection under sublinear-time budgets, with clear conditions under which the method succeeds and guidance on parameter choices. Overall, the paper advances understanding of when and how sublinear randomized methods can yield accurate CUR approximations for large, structured matrices.
Abstract
In this work, we analyze a sublinear-time algorithm for selecting a few rows and columns of a matrix for low-rank approximation purposes. The algorithm is based on an initial uniformly random selection of rows and columns, followed by a refinement of this choice using a strong rank-revealing QR factorization. We prove bounds on the error of the corresponding low-rank approximation (more precisely, the CUR approximation error) when the matrix is a perturbation of a low-rank matrix that can be factorized into the product of matrices with suitable incoherence and/or sparsity assumptions.
