Table of Contents
Fetching ...

Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations

Yifan Chen, Ethan N. Epperly, Joel A. Tropp, Robert J. Webber

TL;DR

A thorough new investigation of the empirical and theoretical behavior of the randomly pivoted Cholesky algorithm, which provably returns low‐rank approximations that are nearly optimal for matrix approximation problems that arise in scientific machine learning.

Abstract

The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix. RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional arithmetic operations, and it can be implemented with just a few lines of code. The method is particularly useful for approximating a kernel matrix. This paper offers a thorough new investigation of the empirical and theoretical behavior of this fundamental algorithm. For matrix approximation problems that arise in scientific machine learning, experiments show that RPCholesky matches or beats the performance of alternative algorithms. Moreover, RPCholesky provably returns low-rank approximations that are nearly optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly support its use in scientific computing and machine learning applications.

Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations

TL;DR

A thorough new investigation of the empirical and theoretical behavior of the randomly pivoted Cholesky algorithm, which provably returns low‐rank approximations that are nearly optimal for matrix approximation problems that arise in scientific machine learning.

Abstract

The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix. RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional arithmetic operations, and it can be implemented with just a few lines of code. The method is particularly useful for approximating a kernel matrix. This paper offers a thorough new investigation of the empirical and theoretical behavior of this fundamental algorithm. For matrix approximation problems that arise in scientific machine learning, experiments show that RPCholesky matches or beats the performance of alternative algorithms. Moreover, RPCholesky provably returns low-rank approximations that are nearly optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly support its use in scientific computing and machine learning applications.
Paper Structure (53 sections, 13 theorems, 129 equations, 3 figures, 3 tables, 7 algorithms)

This paper contains 53 sections, 13 theorems, 129 equations, 3 figures, 3 tables, 7 algorithms.

Key Result

Theorem 2.3

Fix $r \in \mathbb{N}$ and $\varepsilon > 0$, and let $\boldsymbol{A}$ be a psd matrix. The column Nyström approximation $\boldsymbol{\widehat{A}}^{(k)}$ produced by RP-Chol-esky (alg:rpcholesky) attains the error bound eq:1+eps provided that the number of columns, $k$, satisfies

Figures (3)

  • Figure 1: Rank-$k$ approximation of Gaussian kernel matrices. Median relative trace-norm error $\mathop{\mathrm{tr}}\nolimits \bigl(\boldsymbol{A} - \boldsymbol{\widehat{A}}^{(k)}\bigr) \slash \mathop{\mathrm{tr}}\nolimits \boldsymbol{A}$ and 20-80% quantile bars for several Nyström-based column approximation methods for Smile (left) and Spiral (right) examples. Selected pivots (colored stars) and data points (gray circles) for uniform, greedy, and RP-Chol-esky methods are shown next to each panel.
  • Figure 2: Kernel ridge regression for QM9 data.Left: Prediction error \ref{['eq:smape']} for several Nyström algorithms. Right: Relative trace-norm error.
  • Figure 3: Spectral clustering for alanine dipeptide trajectories.Top left: Misclassification rate, averaged over 1000 independent trials. Top right: Example of correct clustering ($<0.2\%$ misclassification) produced by RP-Chol-esky with rank $k = 150$. Bottom: Incorrect clusterings ($>2\%$ misclassification) produced by uniform, RLS and greedy sampling with rank $k = 150$. Black dots mark data points selected as pivots.

Theorems & Definitions (19)

  • Theorem 2.3: Randomly pivoted Cholesky: simplified bound
  • Proposition 3.2: Deshpande et al. DRVW06
  • Theorem 5.1: Randomly pivoted Cholesky
  • Corollary 5.2: Randomly pivoted QR
  • Lemma 5.3: Expected residual
  • Lemma 5.4: Contraction rate
  • Lemma 5.5: Error doubling
  • proof : Proof of \ref{['thm:main_bound']}
  • Theorem C.1: Nyström lower bound DV06GS12
  • proof
  • ...and 9 more