CUR Matrix Approximation through Convex Optimization for Feature Selection
Kathryn Linehan, Radu Balan
TL;DR
This work introduces SF CUR, a deterministic CUR matrix approximation built on convex optimization that selects columns and rows of a data matrix $X$ separately with user-specified counts $c$ and $r$, and provides convergence guarantees. By employing a surrogate-functional framework, the algorithm solves regularized least-squares problems and enforces exact selection counts via bisection over penalties, with the CUR factor $U$ derived from a pseudo-inverse update. The authors establish theoretical convergence to a fixed point and to a minimizer of the CUR objective, and analyze computational complexity and generalizations to alternative norms. Empirically, SF CUR is evaluated on a document-term matrix and gene expression data, demonstrating competitive reconstruction accuracy among CUR methods and notable interpretability benefits for feature selection. A novel application showcases CUR as a feature selector for discriminant protein analysis within self-organizing maps, highlighting the method’s versatility in bioinformatics and clustering contexts.
Abstract
The singular value decomposition (SVD) is commonly used in applications requiring a low rank matrix approximation. However, the singular vectors cannot be interpreted in terms of the original data. For applications requiring this type of interpretation, e.g., selection of important data matrix columns or rows, the approximate CUR matrix factorization can be used. Work on the CUR matrix approximation has generally focused on algorithm development, theoretical guarantees, and applications. In this work, we present a novel deterministic CUR formulation and algorithm with theoretical convergence guarantees. The algorithm utilizes convex optimization, finds important columns and rows separately, and allows the user to control the number of important columns and rows selected from the original data matrix. We present numerical results and demonstrate the effectiveness of our CUR algorithm as a feature selection method on gene expression data. These results are compared to those using the SVD and other CUR algorithms as the feature selection method. Lastly, we present a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM), and compare the performance of multiple CUR algorithms in this application.
