Table of Contents
Fetching ...

Collect, Commit, Expand: Efficient CPQR-Based Column Selection for Extremely Wide Matrices

Robin Armstrong, Anil Damle

TL;DR

This paper introduces CCEQR, a deterministic CPQR-based method for selecting a small subset of columns from extremely wide matrices. By organizing the pivoting as collect–commit–expand cycles, it concentrates computational work on small candidate/tracked column sets and shifts most reflections to BLAS-3, while provably recovering the same column permutation as the Golub–Businger algorithm and achieving GB$(k)$ form. The approach significantly accelerates CSSP in applications where column-norm distributions are highly nonuniform, as demonstrated on spectral clustering and density functional theory problems, with notable speedups over GEQP3 and robust performance in structured scenarios. The work also provides a formal equivalence proof, practical details for updating compact WY representations, and public code to enable reproducibility and adoption in large-scale, column-rich contexts.

Abstract

Column-pivoted QR (CPQR) factorization is a computational primitive used in numerous applications that require selecting a small set of ``representative'' columns from a much larger matrix. These include applications in spectral clustering, model-order reduction, low-rank approximation, and computational quantum chemistry, where the matrix being factorized has a moderate number of rows but an extremely large number of columns. We describe a modification of the Golub-Businger algorithm which, for many matrices of this type, can perform CPQR-based column selection much more efficiently. This algorithm, which we call CCEQR, is based on a three-step ``collect, commit, expand'' strategy that limits the number of columns being manipulated, while also transferring more computational effort from level-2 BLAS to level-3. Unlike most CPQR algorithms that exploit level-3 BLAS, CCEQR is deterministic, and provably recovers a column permutation equivalent to the one computed by the Golub-Businger algorithm. Tests on spectral clustering and Wannier basis localization problems demonstrate that on appropriately structured problems, CCEQR can significantly outperform GEQP3.

Collect, Commit, Expand: Efficient CPQR-Based Column Selection for Extremely Wide Matrices

TL;DR

This paper introduces CCEQR, a deterministic CPQR-based method for selecting a small subset of columns from extremely wide matrices. By organizing the pivoting as collect–commit–expand cycles, it concentrates computational work on small candidate/tracked column sets and shifts most reflections to BLAS-3, while provably recovering the same column permutation as the Golub–Businger algorithm and achieving GB form. The approach significantly accelerates CSSP in applications where column-norm distributions are highly nonuniform, as demonstrated on spectral clustering and density functional theory problems, with notable speedups over GEQP3 and robust performance in structured scenarios. The work also provides a formal equivalence proof, practical details for updating compact WY representations, and public code to enable reproducibility and adoption in large-scale, column-rich contexts.

Abstract

Column-pivoted QR (CPQR) factorization is a computational primitive used in numerous applications that require selecting a small set of ``representative'' columns from a much larger matrix. These include applications in spectral clustering, model-order reduction, low-rank approximation, and computational quantum chemistry, where the matrix being factorized has a moderate number of rows but an extremely large number of columns. We describe a modification of the Golub-Businger algorithm which, for many matrices of this type, can perform CPQR-based column selection much more efficiently. This algorithm, which we call CCEQR, is based on a three-step ``collect, commit, expand'' strategy that limits the number of columns being manipulated, while also transferring more computational effort from level-2 BLAS to level-3. Unlike most CPQR algorithms that exploit level-3 BLAS, CCEQR is deterministic, and provably recovers a column permutation equivalent to the one computed by the Golub-Businger algorithm. Tests on spectral clustering and Wannier basis localization problems demonstrate that on appropriately structured problems, CCEQR can significantly outperform GEQP3.

Paper Structure

This paper contains 23 sections, 3 theorems, 31 equations, 7 figures, 6 algorithms.

Key Result

Lemma 1

\newlabellemma:commit_rule0 Let $\delta,\, \widehat{{\bm{\tau}}},\, \widehat{{\mathbf{V}}},\, \widehat{{\mathbf{R}}}$ be the data returned by the "collect" step of CCEQR (alg:cceqr_collect), and let $\widehat{\mathbf{{\Pi}}}$ be the column permutation matrix applied in line line:collect_permutatio Let $\widehat{{\mathbf{Q}}}$ be the unitary matrix that applies the first $c$ Householder reflection

Figures (7)

  • Figure 1: A schematic representation of CCEQR. In the "collect" stage, a set of candidate skeleton columns are selected from the tracked set. The "commit" stage chooses a subset of these candidates to bring into the skeleton. The "expand" stage brings new columns into the tracked set.
  • Figure 1: Left-panel: cumulative distribution of column norm mass in ${\mathbf{W}}_m^\mathrm{T}$ as a function of column norm quantile, versus cluster separation scale $\ell$. Right panel: median runtime ratios for CCEQR and GEQP3 on $m \times n$ matrices generated from spectral demixing with $m = 20$ components and $n = 400{,}000$ data points, across increasing values of $\rho$ and increasing cluster separation, over 100 trials. Warm colors indicate that CCEQR is faster, and cool colors indicate that GEQP3 is faster. Plotted values range from $0.41$ to $15.49$. Note that CCEQR was used only to select columns, and the full ${\mathbf{R}}$ matrix was not computed.
  • Figure 2: The same experiment as \ref{['fig:cluster_fixed_size']} with $\ell = 6$ fixed and $n$ increasing, with and without full computation of ${\mathbf{R}}$ in CCEQR (note that ${\mathbf{R}}$ is not needed for the clustering application). Hot colors indicate CCEQR performing faster than GEQP3, while cold colors indicate GEQP3 performing faster. Left panel: plotted values range from $0.18$ to $8.31$. Right panel: values range from $0.18$ to $1.80$.
  • Figure 3: Left: median runtimes for CCEQR and GEQP3 over 10 trials on electronic wavefunctions for alkane with $m = 110$ and $n = 820{,}125$, selecting $k = m$ columns, with and without a full Householder reflection at the end of CCEQR. Runtime ratios for GEQP3 over CCEQR range from $1.31$ to $5.97$ (CSSP only) and from $1.20$ to $2.88$ (full CPQR); note that the full CPQR is not needed in this application. Right: cycle counts for CCEQR.
  • Figure 4: The same experiment as in \ref{['fig:dft_alkane']}, this time using wavefunctions from a water molecule with $m = 256$ and $n = 1{,}953{,}125$. See the main text for a discussion of the cycle count spike around $\rho \approx 3 \times 10^{-3}$. Runtime ratios for GEQP3 over CCEQR range from $1.26$ to $23.81$ (CSSP only) and from 1.01 to 2.46 (full CPQR). Note, as before, that the full CPQR is not necessary in this application.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 1
  • Lemma 1
  • Proof 1
  • Theorem 2
  • Proof 2
  • Lemma 1
  • Proof 3