A sublinear-time randomized algorithm for column and row subset selection based on strong rank-revealing QR factorizations

Alice Cortinovis; Lexing Ying

A sublinear-time randomized algorithm for column and row subset selection based on strong rank-revealing QR factorizations

Alice Cortinovis, Lexing Ying

TL;DR

This work develops a sublinear-time randomized sRRQR framework for selecting a small subset of rows and columns to form CUR-type low-rank approximations. It combines uniform sampling with a strong rank-revealing QR refinement and proves high-probability error bounds when the target matrix has a low-rank structure, possibly perturbed by $E$, with incoherence and sparsity in its factors $X$ and $Y$. The theory delivers exact recovery guarantees in the noiseless rank-$k$ case and explicit error bounds for numerically low-rank matrices, plus an iterative variant that further refines the index sets. The results establish practical, scalable column/row subset selection under sublinear-time budgets, with clear conditions under which the method succeeds and guidance on parameter choices. Overall, the paper advances understanding of when and how sublinear randomized methods can yield accurate CUR approximations for large, structured matrices.

Abstract

In this work, we analyze a sublinear-time algorithm for selecting a few rows and columns of a matrix for low-rank approximation purposes. The algorithm is based on an initial uniformly random selection of rows and columns, followed by a refinement of this choice using a strong rank-revealing QR factorization. We prove bounds on the error of the corresponding low-rank approximation (more precisely, the CUR approximation error) when the matrix is a perturbation of a low-rank matrix that can be factorized into the product of matrices with suitable incoherence and/or sparsity assumptions.

A sublinear-time randomized algorithm for column and row subset selection based on strong rank-revealing QR factorizations

TL;DR

, with incoherence and sparsity in its factors

and

. The theory delivers exact recovery guarantees in the noiseless rank-

case and explicit error bounds for numerically low-rank matrices, plus an iterative variant that further refines the index sets. The results establish practical, scalable column/row subset selection under sublinear-time budgets, with clear conditions under which the method succeeds and guidance on parameter choices. Overall, the paper advances understanding of when and how sublinear randomized methods can yield accurate CUR approximations for large, structured matrices.

Abstract

Paper Structure (19 sections, 12 theorems, 66 equations, 3 figures, 2 algorithms)

This paper contains 19 sections, 12 theorems, 66 equations, 3 figures, 2 algorithms.

Introduction
A randomized sRRQR algorithm for column/row selection
Strong rank-revealing QR factorization
The algorithm
When is there hope for Algorithm \ref{['alg:variation']} to work?
Error analysis of Algorithm \ref{['alg:variation']}
Incoherence and useful preliminary results
Analysis for low-rank matrices
Analysis for numerically low-rank matrices
Step #1: A lower bound for $\sigma_{k_1 + k_2}(A(I_0,:))$.
Step #2: The matrix $A(I_0,J_a)$ is well-conditioned.
Step #3: The matrix $Y_1(J_a,:)Y_2(J_a,:)$ is well-conditioned.
Step #4: The matrix $Y(J,:)$ is well-conditioned.
Step #5: The columns are a good subset for low-rank approximation.
Step #6: The failure probability.
...and 4 more sections

Key Result

Theorem 7

Let $X \in \mathbb{R}^{n \times k}$ be a matrix with orthonormal columns, which is $\mu$-coherent. Let $\ell \ge \alpha k \mu$, and let $I \in \{1, \ldots, n\}^\ell$ be a subset of row indices chosen uniformly at random. Then and

Figures (3)

Figure 1: Illustration of the partitioning of the factorization of the matrix $A$ from Assumption \ref{['ass:exact']}. The dotted matrices are sparse, the light blue matrices are incoherent, and the dark blue matrices can be anything.
Figure 2: Logarithm of the magnitude of the entries of the matrix $A$ from Example \ref{['ex:function']} (left) and of its first three left and right singular vectors (middle and right). The first left singular vector and the second right singular vectors are approximately sparse, while all the other ones are incoherent.
Figure 3: The blue part of the plot indicates the low-rank approximation error, namely, the quantity $\|A - A(:,J)A(:,J)^\dagger A A(I,:)^\dagger A(I,:)\|$, where $I$ and $J$ are the index sets returned by Algorithm \ref{['alg:basic']} for the matrix from Example \ref{['ex:function']} (left) and the one from Example \ref{['ex:hilbert']} (right). The $x$-axis denotes the number of iterations of Algorithm \ref{['alg:basic']}. We run Algorithm \ref{['alg:basic']} 100 times; the blue dashed line represents the average error, and the light blue areas represent the 90% confidence region. The dotted magenta line is the approximation error given by the index sets returned by Algorithm \ref{['alg:variation']} (it is constant because the method is not iterative), and the pink area is the 90% confidence region. Note that, while for Example \ref{['ex:function']} the two algorithms give similar errors after the first step, a few iterations of Algorithm \ref{['alg:basic']} help improve the error in the case of Example \ref{['ex:hilbert']}.

Theorems & Definitions (30)

Definition 1: Strong rank-revealing QR factorization Gu1996
Remark 2
Remark 3
Example 4
Example 5
Definition 6
Theorem 7: Tropp2011
Corollary 8
proof
Theorem 9: Chiu2013
...and 20 more

A sublinear-time randomized algorithm for column and row subset selection based on strong rank-revealing QR factorizations

TL;DR

Abstract

A sublinear-time randomized algorithm for column and row subset selection based on strong rank-revealing QR factorizations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (30)