Table of Contents
Fetching ...

On the Optimality of CVOD-based Column Selection

Maria Emelianenko, Guy B. Oldaker

TL;DR

The paper addresses scalable column-subset selection (CSSP) by extending the CVOD framework to pair CVOD and adaptCVOD with any CSSP that yields linearly independent columns. It establishes quantitative links between the quality of the column partition and the resulting CSSP reconstruction error, providing explicit bounds that tie the ID error to the partition energy and termination energy via the best rank-$r$ approximation A_r. The proposed PartitionedCSSP construction assembles a global CSSP solution by sequentially solving per-partition CSSP problems while projecting onto the nullspace of already-selected columns, ensuring full column rank. These results offer principled guidance for selecting partitioning strategies in large-scale CSSP tasks and point to future work with other partitioners (e.g., VQPCA) and numerical studies.

Abstract

While there exists a rich array of matrix column subset selection problem (CSSP) algorithms for use with interpolative and CUR-type decompositions, their use can often become prohibitive as the size of the input matrix increases. In an effort to address these issues, the authors in \cite{emelianenko2024adaptive} developed a general framework that pairs a column-partitioning routine with a column-selection algorithm. Two of the four algorithms presented in that work paired the Centroidal Voronoi Orthogonal Decomposition (\textsf{CVOD}) and an adaptive variant (\textsf{adaptCVOD}) with the Discrete Empirical Interpolation Method (\textsf{DEIM}) \cite{sorensen2016deim}. In this work, we extend this framework and pair the \textsf{CVOD}-type algorithms with any CSSP algorithm that returns linearly independent columns. Our results include detailed error bounds for the solutions provided by these paired algorithms, as well as expressions that explicitly characterize how the quality of the selected column partition affects the resulting CSSP solution.

On the Optimality of CVOD-based Column Selection

TL;DR

The paper addresses scalable column-subset selection (CSSP) by extending the CVOD framework to pair CVOD and adaptCVOD with any CSSP that yields linearly independent columns. It establishes quantitative links between the quality of the column partition and the resulting CSSP reconstruction error, providing explicit bounds that tie the ID error to the partition energy and termination energy via the best rank- approximation A_r. The proposed PartitionedCSSP construction assembles a global CSSP solution by sequentially solving per-partition CSSP problems while projecting onto the nullspace of already-selected columns, ensuring full column rank. These results offer principled guidance for selecting partitioning strategies in large-scale CSSP tasks and point to future work with other partitioners (e.g., VQPCA) and numerical studies.

Abstract

While there exists a rich array of matrix column subset selection problem (CSSP) algorithms for use with interpolative and CUR-type decompositions, their use can often become prohibitive as the size of the input matrix increases. In an effort to address these issues, the authors in \cite{emelianenko2024adaptive} developed a general framework that pairs a column-partitioning routine with a column-selection algorithm. Two of the four algorithms presented in that work paired the Centroidal Voronoi Orthogonal Decomposition (\textsf{CVOD}) and an adaptive variant (\textsf{adaptCVOD}) with the Discrete Empirical Interpolation Method (\textsf{DEIM}) \cite{sorensen2016deim}. In this work, we extend this framework and pair the \textsf{CVOD}-type algorithms with any CSSP algorithm that returns linearly independent columns. Our results include detailed error bounds for the solutions provided by these paired algorithms, as well as expressions that explicitly characterize how the quality of the selected column partition affects the resulting CSSP solution.
Paper Structure (10 sections, 8 theorems, 38 equations, 5 algorithms)

This paper contains 10 sections, 8 theorems, 38 equations, 5 algorithms.

Key Result

Lemma 1

Let $A \in \mathbb{R}^{m \times n}$ with $\hbox{rank}(A) = \rho$, and let $0 < r < \rho$ be a desired target rank. Let $C \in \mathbb{R}^{m \times r}$ be the matrix resulting from any of the partition-based DEIM algorithms with an initial column partition of size $k$ and multi-index $d = (d_1 \dots where $\gamma_C = \max_i\|(I_m - C_iC_i^\dagger)V_i\|_F^2\sigma_\rho^{-2}$ and $\sigma_1 \ge \sigma

Theorems & Definitions (13)

  • Lemma 1
  • Theorem 1
  • Theorem 2
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 3
  • proof
  • ...and 3 more