On the Optimality of CVOD-based Column Selection
Maria Emelianenko, Guy B. Oldaker
TL;DR
The paper addresses scalable column-subset selection (CSSP) by extending the CVOD framework to pair CVOD and adaptCVOD with any CSSP that yields linearly independent columns. It establishes quantitative links between the quality of the column partition and the resulting CSSP reconstruction error, providing explicit bounds that tie the ID error to the partition energy and termination energy via the best rank-$r$ approximation A_r. The proposed PartitionedCSSP construction assembles a global CSSP solution by sequentially solving per-partition CSSP problems while projecting onto the nullspace of already-selected columns, ensuring full column rank. These results offer principled guidance for selecting partitioning strategies in large-scale CSSP tasks and point to future work with other partitioners (e.g., VQPCA) and numerical studies.
Abstract
While there exists a rich array of matrix column subset selection problem (CSSP) algorithms for use with interpolative and CUR-type decompositions, their use can often become prohibitive as the size of the input matrix increases. In an effort to address these issues, the authors in \cite{emelianenko2024adaptive} developed a general framework that pairs a column-partitioning routine with a column-selection algorithm. Two of the four algorithms presented in that work paired the Centroidal Voronoi Orthogonal Decomposition (\textsf{CVOD}) and an adaptive variant (\textsf{adaptCVOD}) with the Discrete Empirical Interpolation Method (\textsf{DEIM}) \cite{sorensen2016deim}. In this work, we extend this framework and pair the \textsf{CVOD}-type algorithms with any CSSP algorithm that returns linearly independent columns. Our results include detailed error bounds for the solutions provided by these paired algorithms, as well as expressions that explicitly characterize how the quality of the selected column partition affects the resulting CSSP solution.
