Column and row subset selection using nuclear scores: algorithms and theory for Nyström approximation, CUR decomposition, and graph Laplacian reduction

Mark Fornace; Michael Lindsey

Column and row subset selection using nuclear scores: algorithms and theory for Nyström approximation, CUR decomposition, and graph Laplacian reduction

Mark Fornace, Michael Lindsey

TL;DR

This work develops unified, structure-preserving column/row subset selection methods for Nyström approximation, CUR decomposition, and inverse graph-Laplacian rank reduction via nuclear-score maximization. It introduces deterministic and fully matrix-free algorithms, the latter relying on randomized diagonal estimation with concentration guarantees to bound scores while avoiding explicit formation of large kernel or Laplacian-derived operators. The authors provide extensive error analyses, connecting greedy nuclear maximization to DPP expectations and deriving submodularity-based bounds for the Laplacian case, with strong empirical results across kernel, CUR, and Laplacian tasks. The framework achieves competitive or superior accuracy with favorable computational scaling, supported by open-source implementations and demonstrations on large-scale problems.

Abstract

Column selection is an essential tool for structure-preserving low-rank approximation, with wide-ranging applications across many fields, such as data science, machine learning, and theoretical chemistry. In this work, we develop unified methodologies for fast, efficient, and theoretically guaranteed column selection. First we derive and implement a sparsity-exploiting deterministic algorithm applicable to tasks including kernel approximation and CUR decomposition. Next, we develop a matrix-free formalism relying on a randomization scheme satisfying guaranteed concentration bounds, applying this construction both to CUR decomposition and to the approximation of matrix functions of graph Laplacians. Importantly, the randomization is only relevant for the computation of the scores that we use for column selection, not the selection itself given these scores. For both deterministic and matrix-free algorithms, we bound the performance favorably relative to the expected performance of determinantal point process (DPP) sampling and, in select scenarios, that of exactly optimal subset selection. The general case requires new analysis of the DPP expectation. Finally, we demonstrate strong real-world performance of our algorithms on a diverse set of example approximation tasks.

Column and row subset selection using nuclear scores: algorithms and theory for Nyström approximation, CUR decomposition, and graph Laplacian reduction

TL;DR

Abstract

Paper Structure (75 sections, 47 theorems, 186 equations, 6 figures, 4 tables, 6 algorithms)

This paper contains 75 sections, 47 theorems, 186 equations, 6 figures, 4 tables, 6 algorithms.

Introduction
Past work
Spectral methods:
Leverage score methods:
Pivoted decompositions:
Additional sampling approaches:
Experimental design:
Submodularity:
Alternative and related methods
Uniform sampling
Diagonal maximization
Diagonal sampling
Determinantal point process sampling
Nuclear maximization
Nomenclature
...and 60 more sections

Key Result

lemma 2.1

$\epsilon_A(\mathop{\mathrm{\mathcal{J}}}\nolimits) = \mathcal{E}_K (\mathop{\mathrm{\mathcal{J}}}\nolimits)$, where $K = A^{ \scaleobj{0.8}{\top}} A$.

Figures (6)

Figure 1: Overview of theoretical error bounds. Dashed arrows refer to inequalities in the form of upper (best-case) bounds. Solid arrows refer to inequalities in the form of lower (worst-case) bounds. Inequalities (d), (e), and (f) are elementary and indicate the best performance one could achieve. Inequality (a) is from the theory of $k$-DPP sampling and elementary symmetric polynomials, while inequalities (b) and (c) are new contributions.
Figure 2: Adversarial examples for kernel and graph Laplacian reduction. (a) Relative objective values $\mathcal{L}_K(\mathop{\mathrm{\mathcal{I}}}\nolimits) / \mathrm{Tr}[K]$ for the example (\ref{['eq:pathological1']}) with $n=2000,n_c=45,\alpha=1.00001$. Nuclear maximization matches the eigenvalue bound, while each other method performs substantially worse. Diagonal maximization is worst of all, picking outlier nodes at each iteration, but uniform and diagonal sampling perform worse as well (and are visually coincident). (b) Relative objective values $\mathcal{L}_L^h(\mathop{\mathrm{\mathcal{I}}}\nolimits) / \mathrm{Tr}[L^+]$ obtained for the star graph Laplacian construction (\ref{['eq:pathological-laplacian']}) with $n=100,\beta=0.9999$. Nuclear maximization matches the eigenvalue bound, while the other methods are qualitatively inferior (and are visually coincident). Convergence in the latter methods is substantially slower since the central node is picked only after a number of iterations. In each panel, for randomized column selection approaches, medians are plotted in 20% and 80% quantile envelopes over 1000 replicates.
Figure 3: Illustrative examples of relative trace error ($1 - \mathcal{L}_K(\mathop{\mathrm{\mathcal{I}}}\nolimits)/\mathrm{Tr}[K]$) derived from kernel approximation. (a) A squared exponential kernel ($\sigma=0.4$) on a cloud of 1000 randomly distributed points from $\mathcal{N}(\mathbf{0}_2, \mathbb{I}_2)$. (b) From Chen2022-lp, a squared exponential kernel ($\sigma=10^3$) over $10^4$ points arranged in a spiral $(e^{t/5} \cos(t), e^{t/5} \sin(t))$ for evenly spaced $t$ in $[0,64]$). (c) From Chen2022-lp, a squared exponential kernel ($\sigma=2$) over 10,000 points distributed uniformly in a 2D smiley face (50 per eye, 1,980 for the smile, and 7,920 for the outline). In each subfigure, uniform and diagonal sampling results are plotted according to their medians over 100 replicates, with error envelopes from 20% and 80% quantiles depicted as well.
Figure 4: Relative Frobenius error $\norm{CUR - A}_F / \norm{A}_F$ for collected examples of CUR decomposition of sparse matrices from Table \ref{['tab:suitesparse']} using deterministic scoring (Algorithm \ref{['alg:exact-cholesky']}). For uniform and diagonal sampling, medians are plotted with error envelopes from 20% and 80% quantiles, based on results over 10 trials. Top singular values were computed using subspace iteration.
Figure 5: Matrix-free equivalent to Figure \ref{['fig:cur-suitesparse-deterministic']}, using Algorithm \ref{['alg:randomized-cholesky']} and $z=200$ for each randomized diagonal approximation. Medians are plotted in a 20% and 80% quantile envelope for all non-SVD results over 10 trials.
...and 1 more figures

Theorems & Definitions (80)

lemma 2.1
lemma 2.2
definition 1: Graph Laplacians and rescaled Laplacians
lemma 2.3: Complementary trace formulation for inverse Laplacians
theorem 1: Stochastic diagonal approximation
corollary 1.1: Stochastic ratio maximization
definition 2: Greedy, approximately greedy, and optimal subsets
theorem 2: Generalized linear programming bound
corollary 2.1: Constraints based on accumulated gain
corollary 2.2: Constraints based on initial gain
...and 70 more

Column and row subset selection using nuclear scores: algorithms and theory for Nyström approximation, CUR decomposition, and graph Laplacian reduction

TL;DR

Abstract

Column and row subset selection using nuclear scores: algorithms and theory for Nyström approximation, CUR decomposition, and graph Laplacian reduction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (80)