Sparse Robust Classification via the Kernel Mean
Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson
TL;DR
The paper advocates a mean-based kernel classifier $f(x)=\mathrm{sign}\left(\frac{1}{n}\sum_i y_i K(x_i,x)\right)$ and shows it arises from empirical risk minimization under a linear loss, connecting it to SVM, MMD, and KDE. It establishes robustness to symmetric label noise, including a formal immunity result, and derives margin-based bounds linking approximation error to classification performance. The work then develops a scalable sparsification framework via clustered subsampling that yields sparse kernel means with provable error bounds and parallelizable computation. Empirical results corroborate the sparsity and robustness claims, demonstrating practical utility on synthetic and real datasets. Overall, the approach delivers a simple, robust, and scalable kernel-classification paradigm with strong theoretical guarantees.
Abstract
Many leading classification algorithms output a classifier that is a weighted average of kernel evaluations. Optimizing these weights is a nontrivial problem that still attracts much research effort. Furthermore, explaining these methods to the uninitiated is a difficult task. Letting all the weights be equal leads to a conceptually simpler classification rule, one that requires little effort to motivate or explain, the mean. Here we explore the consistency, robustness and sparsification of this simple classification rule.
