Supervised Kernel Thinning

Albert Gong; Kyuseong Choi; Raaz Dwivedi

Supervised Kernel Thinning

Albert Gong, Kyuseong Choi, Raaz Dwivedi

TL;DR

The paper addresses the computational bottlenecks of kernel-based regression (NW and KRR) by introducing Kernel Thinning (KT) to compress the data into a small coreset and to define two supervised estimators, KT-NW and KT-KRR. KT-NW and KT-KRR inherit fast inference times and provable error guarantees, achieving substantial speedups over full-data methods while maintaining strong statistical performance. The authors provide finite- and infinite-dimensional kernel guarantees, including MSE rates, and validate the approach on simulations and real datasets, showing favorable tradeoffs against standard thinning baselines. This work offers a practical pathway to scalable kernel regression with principled compression and rigorous error controls, enabling efficient training and inference in large-scale settings.

Abstract

The kernel thinning algorithm of Dwivedi & Mackey (2024) provides a better-than-i.i.d. compression of a generic set of points. By generating high-fidelity coresets of size significantly smaller than the input points, KT is known to speed up unsupervised tasks like Monte Carlo integration, uncertainty quantification, and non-parametric hypothesis testing, with minimal loss in statistical accuracy. In this work, we generalize the KT algorithm to speed up supervised learning problems involving kernel methods. Specifically, we combine two classical algorithms--Nadaraya-Watson (NW) regression or kernel smoothing, and kernel ridge regression (KRR)--with KT to provide a quadratic speed-up in both training and inference times. We show how distribution compression with KT in each setting reduces to constructing an appropriate kernel, and introduce the Kernel-Thinned NW and Kernel-Thinned KRR estimators. We prove that KT-based regression estimators enjoy significantly superior computational efficiency over the full-data estimators and improved statistical efficiency over i.i.d. subsampling of the training data. En route, we also provide a novel multiplicative error guarantee for compressing with KT. We validate our design choices with both simulations and real data experiments.

Supervised Kernel Thinning

TL;DR

Abstract

Supervised Kernel Thinning

TL;DR

Abstract

Paper Structure

Table of Contents