Table of Contents
Fetching ...

Sparse Robust Classification via the Kernel Mean

Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

TL;DR

The paper advocates a mean-based kernel classifier $f(x)=\mathrm{sign}\left(\frac{1}{n}\sum_i y_i K(x_i,x)\right)$ and shows it arises from empirical risk minimization under a linear loss, connecting it to SVM, MMD, and KDE. It establishes robustness to symmetric label noise, including a formal immunity result, and derives margin-based bounds linking approximation error to classification performance. The work then develops a scalable sparsification framework via clustered subsampling that yields sparse kernel means with provable error bounds and parallelizable computation. Empirical results corroborate the sparsity and robustness claims, demonstrating practical utility on synthetic and real datasets. Overall, the approach delivers a simple, robust, and scalable kernel-classification paradigm with strong theoretical guarantees.

Abstract

Many leading classification algorithms output a classifier that is a weighted average of kernel evaluations. Optimizing these weights is a nontrivial problem that still attracts much research effort. Furthermore, explaining these methods to the uninitiated is a difficult task. Letting all the weights be equal leads to a conceptually simpler classification rule, one that requires little effort to motivate or explain, the mean. Here we explore the consistency, robustness and sparsification of this simple classification rule.

Sparse Robust Classification via the Kernel Mean

TL;DR

The paper advocates a mean-based kernel classifier and shows it arises from empirical risk minimization under a linear loss, connecting it to SVM, MMD, and KDE. It establishes robustness to symmetric label noise, including a formal immunity result, and derives margin-based bounds linking approximation error to classification performance. The work then develops a scalable sparsification framework via clustered subsampling that yields sparse kernel means with provable error bounds and parallelizable computation. Empirical results corroborate the sparsity and robustness claims, demonstrating practical utility on synthetic and real datasets. Overall, the approach delivers a simple, robust, and scalable kernel-classification paradigm with strong theoretical guarantees.

Abstract

Many leading classification algorithms output a classifier that is a weighted average of kernel evaluations. Optimizing these weights is a nontrivial problem that still attracts much research effort. Furthermore, explaining these methods to the uninitiated is a difficult task. Letting all the weights be equal leads to a conceptually simpler classification rule, one that requires little effort to motivate or explain, the mean. Here we explore the consistency, robustness and sparsification of this simple classification rule.

Paper Structure

This paper contains 44 sections, 22 theorems, 136 equations, 3 figures, 1 table, 2 algorithms.

Key Result

Lemma 3

For all distributions $P$ and for all $f \in {[-1,1]^{X}}$,

Figures (3)

  • Figure 1: Checkerboard data set, illustrating the utility of clustered sub-sampling. See text.
  • Figure 2: Sparse Approximation of the checkerboard data set. See text.
  • Figure 3: Mean classifier performance on Long and Servedio data set.

Theorems & Definitions (26)

  • Definition 1
  • Definition 2
  • Lemma 3: Steinwart:2008 theorem 2.31
  • Lemma 4: Sriperumbudur2009
  • Theorem 5: Altun2006
  • Theorem 6
  • Theorem 7
  • Corollary 8
  • Lemma 9
  • Lemma 10
  • ...and 16 more