Table of Contents
Fetching ...

How to reveal the rank of a matrix?

Anil Damle, Silke Glas, Alex Townsend, Annan Yu

TL;DR

This work establishes a unifying, LMV-centered framework for rank-revealers based on geometric pivoting in GE and QR. It proves that local (and near-local) maximum-volume pivots are necessary and sufficient to obtain reliable leading and trailing singular-value estimates, yielding explicit bounds $\mu_{m,n,k}$ and interpolative constants $\nu$. The authors present practical algorithms (including near-local LMV variants) and a readily computable metric $\mu_B$ to assess pivot quality, demonstrating that CPQR and GECP typically achieve near-LMV behavior in practice. They provide theoretical reductions, algorithmic strategies, and diverse applications (kernel approximations, MOR, and localized orbital functions) to illustrate the broad impact of rank-revealers. The work thus links theory and practice, offering a principled path to fast, reliable SVD-free rank estimation.

Abstract

We study algorithms called rank-revealers that reveal a matrix's rank structure. Such algorithms form a fundamental component in matrix compression, singular value estimation, and column subset selection problems. While column-pivoted QR has been widely adopted due to its practicality, it is not always a rank-revealer. Conversely, Gaussian elimination (GE) with a pivoting strategy known as global maximum volume pivoting is guaranteed to estimate a matrix's singular values but its exponential complexity limits its interest to theory. We show that the concept of local maximum volume pivoting is a crucial and practical pivoting strategy for rank-revealers based on GE and QR. In particular, we prove that it is both necessary and sufficient; highlighting that all local solutions are nearly as good as the global one. This insight elevates Gu and Eisenstat's rank-revealing QR as an archetypal rank-revealer, and we implement a version that is observed to be at most $2\times$ more computationally expensive than CPQR. We unify the landscape of rank-revealers by considering GE and QR together and prove that the success of any pivoting strategy can be assessed by benchmarking it against a local maximum volume pivot.

How to reveal the rank of a matrix?

TL;DR

This work establishes a unifying, LMV-centered framework for rank-revealers based on geometric pivoting in GE and QR. It proves that local (and near-local) maximum-volume pivots are necessary and sufficient to obtain reliable leading and trailing singular-value estimates, yielding explicit bounds and interpolative constants . The authors present practical algorithms (including near-local LMV variants) and a readily computable metric to assess pivot quality, demonstrating that CPQR and GECP typically achieve near-LMV behavior in practice. They provide theoretical reductions, algorithmic strategies, and diverse applications (kernel approximations, MOR, and localized orbital functions) to illustrate the broad impact of rank-revealers. The work thus links theory and practice, offering a principled path to fast, reliable SVD-free rank estimation.

Abstract

We study algorithms called rank-revealers that reveal a matrix's rank structure. Such algorithms form a fundamental component in matrix compression, singular value estimation, and column subset selection problems. While column-pivoted QR has been widely adopted due to its practicality, it is not always a rank-revealer. Conversely, Gaussian elimination (GE) with a pivoting strategy known as global maximum volume pivoting is guaranteed to estimate a matrix's singular values but its exponential complexity limits its interest to theory. We show that the concept of local maximum volume pivoting is a crucial and practical pivoting strategy for rank-revealers based on GE and QR. In particular, we prove that it is both necessary and sufficient; highlighting that all local solutions are nearly as good as the global one. This insight elevates Gu and Eisenstat's rank-revealing QR as an archetypal rank-revealer, and we implement a version that is observed to be at most more computationally expensive than CPQR. We unify the landscape of rank-revealers by considering GE and QR together and prove that the success of any pivoting strategy can be assessed by benchmarking it against a local maximum volume pivot.
Paper Structure (36 sections, 9 theorems, 78 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 36 sections, 9 theorems, 78 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

\newlabelthm.sufficientLU0 GE with local maximum volume pivoting is a rank-revealer with $\mu_{m,n,k} = 1+5k\sqrt{mn}$ (see eq:GoodLeadingSV and eq:GoodTrailingSV) and computes a partial LU factorization satisfying interpolative bounds with $\nu\leq 1$ (see def:InterpolativeBoundsGE).

Figures (8)

  • Figure 1: Different pivoting strategies in GE and QR can be more or less computationally efficient as well as better or worse singular value estimators. In this paper, we show that local maximum volume pivoting (see \ref{['sec:maxvolpivoting']}) is a balance between computational efficiency and theoretical guarantees. Near-local maximum volume pivoting is a necessary and sufficient pivoting strategy for GE and QR to be rank-revealers (see \ref{['thm.sufficientLU', 'thm.sufficientQR', 'thm.necessaryLU', 'thm.necessaryQR2']}).
  • Figure 1: Left: Four representative paths taken by \ref{['alg:localMaxVol']} on the volume submatrix graph for an $11\times 11$ random Gaussian matrix with $3\times 3$ submatrices. There are $27,225$ nodes in the volume submatrix graph. The graph is so large that the individual nodes have merged; instead, we see a color map of the node values. Starting at these four corner submatrices, \ref{['alg:localMaxVol']} finds local maximum volume submatrices with path lengths of 3 (blue), 7 (magenta), 8 (red), and 10 (black). Two of the paths coalesce (red and magenta) and then follow each other after that. Right: Histogram of the distribution of path lengths to find a local maximum volume submatrix using \ref{['alg:localMaxVol']}, starting at all $27,225$ nodes. Despite there only being three local maximum volume submatrices, the maximum path length to find one is 16.
  • Figure 1: The computational cost of GE and QR for $1\leq k\leq 500$ with near-local maximum volume pivoting on a $500\times 500$ randomly generated matrix with standard Gaussian entries. Here, we are finding a near-local maximum with \ref{['alg:NearLocalMaxVol']}. Left: For GE, we select $\gamma=3$ and compare the timings against GECP. We find that the computational cost of GE with $3$-local maximum volume pivoting is no more than $1.4\times$ slower than GECP. A similar observation is made in schork2020rank. Right: For QR, we select $\gamma=2$ and compare the timings against our implementation of CPQR. We find that the computational cost of QR with $2$-local maximum volume pivoting is no more than $2\times$ slower than CPQR.
  • Figure 1: We observe that GECP (left) and CPQR (right) find near-local maximum volume submatrices with small $\mu_B$ on $50\times 50$ random Gaussian matrices with extremely high probability and $k=20$. Here, we randomly generate 10,000 Gaussian matrices, compute the metric in \ref{['eq:metric']}, and plot a histogram. Despite the worse-case bound on $\mu_B$ being exponential in $k$ for GECP and CPQR, most of the time these pivoting strategies find submatrices with small $\mu_B$.
  • Figure 1: Left: The skeleton selected by GECP for the function $f(x,y) = {\rm Ai}(5(x+y^2)){\rm Ai}(-5(x^2+y))$ on $[-1,1]\times [-1,1]$ when discretized on a $129\times 129$ bivariate Chebyshev grid and $k=20$. Right: GECP skeleton for the function $f(x,y) = 1/(1+100(1/2-x^2-y^2)^2)$ on $[-1,1]\times [-1,1]$ when discretized on a $513\times 513$ bivariate Chebyshev grid and $k=65$.
  • ...and 3 more figures

Theorems & Definitions (27)

  • Definition 1
  • Definition 2
  • Definition 1: Volume
  • Definition 2: Local maximum volume
  • Example 1
  • Theorem 1
  • Proof 1
  • Example 2
  • Theorem 1
  • Proof 2
  • ...and 17 more