Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression
Pratik Rathore, Zachary Frangella, Jiaming Yang, Michał Dereziński, Madeleine Udell
TL;DR
This work addresses the scalability of kernel ridge regression (KRR) on very large datasets by introducing ASkotch, a scalable, accelerated solver for full KRR. It combines sketch-and-project updates with Nyström low-rank approximations and Nesterov acceleration to achieve linear convergence, with theoretical guarantees that under favorable kernel spectra the convergence is nearly independent of conditioning. The authors derive a convergence framework based on ridge leverage scores and determinantal point processes to bound projection shrinkage and show near-optimal, log-linear runtime for kernels with modest effective dimension, while maintaining linear convergence. Empirically, ASkotch outperforms state-of-the-art methods for both full KRR and inducing-points KRR across 23 tasks, including a huge taxi dataset, demonstrating its practical impact on scalable, high-accuracy KRR in diverse domains. The work opens the door to new, large-scale applications of full KRR and suggests future directions in distributed, mixed-precision, and automated-parameter implementations. All mathematical concepts are presented with explicit notation, enabling precise adoption and extension in high-performance contexts.
Abstract
Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, full KRR solvers are challenging to scale to large datasets: both direct (i.e., Cholesky decomposition) and iterative methods (i.e., PCG) incur prohibitive computational and storage costs. The standard approach to scale KRR to large datasets chooses a set of inducing points and solves an approximate version of the problem, inducing points KRR. However, the resulting solution tends to have worse predictive performance than the full KRR solution. In this work, we introduce a new solver, ASkotch, for full KRR that provides better solutions faster than state-of-the-art solvers for full and inducing points KRR. ASkotch is a scalable, accelerated, iterative method for full KRR that provably obtains linear convergence. Under appropriate conditions, we show that ASkotch obtains condition-number-free linear convergence. This convergence analysis rests on the theory of ridge leverage scores and determinantal point processes. ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks derived from a wide range of application domains, demonstrating the superiority of full KRR over inducing points KRR. Our work opens up the possibility of as-yet-unimagined applications of full KRR across a number of disciplines.
