Robust, randomized preconditioning for kernel ridge regression
Mateo Díaz, Ethan N. Epperly, Zachary Frangella, Joel A. Tropp, Robert J. Webber
TL;DR
This work addresses scalable kernel ridge regression by introducing two randomized preconditioners: RPCholesky for full-data KRR and KRILL for restricted KRR. RPCholesky leverages a low-rank Nyström-like approximation with random pivots to form P = Ahat + μI, enabling O(N^2) total cost and near-constant CG iterations under favorable eigenvalue decay, with rigorous guarantees linked to the μ-tail rank. KRILL uses a sparse random sign embedding to sketch the Gram matrix of centers, constructing P = B^*B + μA(S,S) and delivering robust convergence for any μ and kernel under the stated embedding conditions, with cost O((N+k^2)k log k). The methods demonstrate strong empirical performance across diverse datasets (including quantum chemistry HOMO energy tasks and SUSY particle detection) and provide theoretical convergence guarantees that advance the reliability of preconditioned CG for KRR in large-scale settings. Collectively, RPCholesky and KRILL offer practical, robust, and scalable tools for solving KRR problems in scientific computing and data-driven modeling scenarios.
Abstract
This paper investigates preconditioned conjugate gradient techniques for solving kernel ridge regression (KRR) problems with a medium to large number of data points ($10^4 \leq N \leq 10^7$), and it describes two methods with the strongest guarantees available. The first method, RPCholesky preconditioning, accurately solves the full-data KRR problem in $O(N^2)$ arithmetic operations, assuming sufficiently rapid polynomial decay of the kernel matrix eigenvalues. The second method, KRILL preconditioning, offers an accurate solution to a restricted version of the KRR problem involving $k \ll N$ selected data centers at a cost of $O((N + k^2) k \log k)$ operations. The proposed methods efficiently solve a range of KRR problems, making them well-suited for practical applications.
