Table of Contents
Fetching ...

Scalable Binary CUR Low-Rank Approximation Algorithm

Bowen Su

TL;DR

The paper tackles scalable low-rank approximation for very large matrices by developing a Scalable Binary CUR algorithm that deterministically selects representative rows and columns in parallel. It combines a blockwise Adaptive Cross Approximation framework with a binary parallel selection mechanism to form CUR factors $(C,U,R)$ efficiently, achieving a per-iteration cost of $\mathcal{O}(r\,nm/b)$ and practical speedups on multi-core hardware. Empirical results on Hilbert and synthetic low-rank matrices show near-optimal reconstruction as the target rank $r$ grows, while scalability experiments on $16384\times16384$ matrices demonstrate substantial, though sublinear, speedups with increasing process counts due to parallel overhead. The approach offers a practical, deterministic route to accurate CUR-based low-rank approximations for large-scale data in applications requiring scalable matrix factorization.

Abstract

This paper proposes a scalable binary CUR low-rank approximation algorithm that leverages parallel selection of representative rows and columns within a deterministic framework. By employing a blockwise adaptive cross approximation strategy, the algorithm efficiently identifies dominant components in large-scale matrices, thereby reducing computational costs. Numerical experiments on $16,384 \times 16,384$ matrices demonstrate a good speed-up, with execution time decreasing from $12.37$ seconds using $2$ processes to $1.02$ seconds using $64$ processes. The tests on Hilbert matrices and synthetic low-rank matrices of different size across various sizes demonstrate an near-optimal reconstruction accuracy.

Scalable Binary CUR Low-Rank Approximation Algorithm

TL;DR

The paper tackles scalable low-rank approximation for very large matrices by developing a Scalable Binary CUR algorithm that deterministically selects representative rows and columns in parallel. It combines a blockwise Adaptive Cross Approximation framework with a binary parallel selection mechanism to form CUR factors efficiently, achieving a per-iteration cost of and practical speedups on multi-core hardware. Empirical results on Hilbert and synthetic low-rank matrices show near-optimal reconstruction as the target rank grows, while scalability experiments on matrices demonstrate substantial, though sublinear, speedups with increasing process counts due to parallel overhead. The approach offers a practical, deterministic route to accurate CUR-based low-rank approximations for large-scale data in applications requiring scalable matrix factorization.

Abstract

This paper proposes a scalable binary CUR low-rank approximation algorithm that leverages parallel selection of representative rows and columns within a deterministic framework. By employing a blockwise adaptive cross approximation strategy, the algorithm efficiently identifies dominant components in large-scale matrices, thereby reducing computational costs. Numerical experiments on matrices demonstrate a good speed-up, with execution time decreasing from seconds using processes to seconds using processes. The tests on Hilbert matrices and synthetic low-rank matrices of different size across various sizes demonstrate an near-optimal reconstruction accuracy.

Paper Structure

This paper contains 12 sections, 2 theorems, 7 equations, 4 figures, 3 algorithms.

Key Result

Theorem 1

Let $A$ be a matrix of size $m \times n$ and $r$ be an integer satisfying $1 \leq r < \min\{m, n\})$. Suppose that $I \subset \{1, \cdots, m\}$ with $\#(I) = r$ and $J \subset \{1, \cdots, n\}$ with $\#(J) = r$ such that $A_{I,J}$ has the maximal volume among all $r \times r$ submatrices of $A$. The where $A_r= A_{:,J} A_{I,J}^{\dagger} A_{I,:}$ is the $r$-cross approximation of $A$, $\sigma_{r+1}

Figures (4)

  • Figure 1: CUR Approximation
  • Figure 2: Approximation error of \ref{['alg:Algorithm_CUR']} for Hilbert matrices. The x-axis represents the number of selected rows/columns, and the y-axis shows the Frobenius norm relative error.
  • Figure 3: Relative Errors for Different Ranks
  • Figure 4: Caption

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1: cf. goreinov2001maximal
  • Theorem 2: cf. allen2024maximal