Table of Contents
Fetching ...

No Dimensional Sampling Coresets for Classification

Meysam Alishahi, Jeff M. Phillips

TL;DR

The paper delivers dimension-free coresets for classification by merging sensitivity sampling with Radamacher complexity analysis, enabling the approximation of expected losses over distributions without dependence on ambient dimension $d$. It introduces an $s$-sensitivity framework with a new Radamacher-based bound and a simpler VC-based proof, yielding no-dimensional coresets of size $O(k^3/\\\varepsilon^2)$ for monotone losses and improved constants for logistic-type losses. The authors extend these results to distributional inputs, iid sampling, and a variety of losses (logistic, sigmoid, SVM, ReLU), and also show how kernel methods via RKHS naturally fit within the same framework, with KDE corollaries. The results provide explicit sample-complexity bounds and establish practical dimension-free guarantees for regularized classification tasks, offering a robust theoretical foundation for data compression in large-scale learning settings.

Abstract

We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.

No Dimensional Sampling Coresets for Classification

TL;DR

The paper delivers dimension-free coresets for classification by merging sensitivity sampling with Radamacher complexity analysis, enabling the approximation of expected losses over distributions without dependence on ambient dimension . It introduces an -sensitivity framework with a new Radamacher-based bound and a simpler VC-based proof, yielding no-dimensional coresets of size for monotone losses and improved constants for logistic-type losses. The authors extend these results to distributional inputs, iid sampling, and a variety of losses (logistic, sigmoid, SVM, ReLU), and also show how kernel methods via RKHS naturally fit within the same framework, with KDE corollaries. The results provide explicit sample-complexity bounds and establish practical dimension-free guarantees for regularized classification tasks, offering a robust theoretical foundation for data compression in large-scale learning settings.

Abstract

We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.
Paper Structure (27 sections, 45 theorems, 163 equations, 3 tables)

This paper contains 27 sections, 45 theorems, 163 equations, 3 tables.

Key Result

Theorem 1.1

Let $(\mathcal{X}, P, \mathcal{F})$ be a positive definite tuple and $s(\cdot)$ be an upper sensitivity function with the total sensitivity $S$. For any $t>0$, an $s$-sensitivity sample from $\mathcal{X}$ of size $m$, with probability at least $1-2\exp \left(-{\frac{2mt^{2}}{S}}\right)$, satisfies

Theorems & Definitions (78)

  • Theorem 1.1
  • Theorem 1.2
  • Definition 1.3
  • Theorem 1.4
  • Theorem 1.5: pmlr-v162-tolochinksy22a
  • Theorem 1.6
  • proof : Sketch of proof of Theorem \ref{['thm:main_rademacher1']}.
  • Lemma 2.1
  • proof : Proof of Theorem \ref{['thm:main_braverman2022new']}
  • Lemma 2.2
  • ...and 68 more