No Dimensional Sampling Coresets for Classification

Meysam Alishahi; Jeff M. Phillips

No Dimensional Sampling Coresets for Classification

Meysam Alishahi, Jeff M. Phillips

TL;DR

The paper delivers dimension-free coresets for classification by merging sensitivity sampling with Radamacher complexity analysis, enabling the approximation of expected losses over distributions without dependence on ambient dimension $d$. It introduces an $s$-sensitivity framework with a new Radamacher-based bound and a simpler VC-based proof, yielding no-dimensional coresets of size $O(k^3/\\\varepsilon^2)$ for monotone losses and improved constants for logistic-type losses. The authors extend these results to distributional inputs, iid sampling, and a variety of losses (logistic, sigmoid, SVM, ReLU), and also show how kernel methods via RKHS naturally fit within the same framework, with KDE corollaries. The results provide explicit sample-complexity bounds and establish practical dimension-free guarantees for regularized classification tasks, offering a robust theoretical foundation for data compression in large-scale learning settings.

Abstract

We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.

No Dimensional Sampling Coresets for Classification

TL;DR

. It introduces an

-sensitivity framework with a new Radamacher-based bound and a simpler VC-based proof, yielding no-dimensional coresets of size

for monotone losses and improved constants for logistic-type losses. The authors extend these results to distributional inputs, iid sampling, and a variety of losses (logistic, sigmoid, SVM, ReLU), and also show how kernel methods via RKHS naturally fit within the same framework, with KDE corollaries. The results provide explicit sample-complexity bounds and establish practical dimension-free guarantees for regularized classification tasks, offering a robust theoretical foundation for data compression in large-scale learning settings.

Abstract

Paper Structure (27 sections, 45 theorems, 163 equations, 3 tables)

This paper contains 27 sections, 45 theorems, 163 equations, 3 tables.

Introduction
Main Results.
Preliminaries
Our Contributions
Well-behaved distributions.
Connecting to coresets.
Bounds for specific $\phi$.
Sample complexity.
Proofs of Main Results
Proof of Theorem \ref{['thm:main_rademacher1']}
Proof of Theorem \ref{['thm:main_braverman2022new']}
Proof of Theorem \ref{['thm:main_monotonic_coreset_R']}
Proof of results in Table \ref{['tab:main_mainresults_R']}
Conclusion and Experimental results
Generalized Framework: Reproducing kernel Hilbert Space
...and 12 more sections

Key Result

Theorem 1.1

Let $(\mathcal{X}, P, \mathcal{F})$ be a positive definite tuple and $s(\cdot)$ be an upper sensitivity function with the total sensitivity $S$. For any $t>0$, an $s$-sensitivity sample from $\mathcal{X}$ of size $m$, with probability at least $1-2\exp \left(-{\frac{2mt^{2}}{S}}\right)$, satisfies

Theorems & Definitions (78)

Theorem 1.1
Theorem 1.2
Definition 1.3
Theorem 1.4
Theorem 1.5: pmlr-v162-tolochinksy22a
Theorem 1.6
proof : Sketch of proof of Theorem \ref{['thm:main_rademacher1']}.
Lemma 2.1
proof : Proof of Theorem \ref{['thm:main_braverman2022new']}
Lemma 2.2
...and 68 more

No Dimensional Sampling Coresets for Classification

TL;DR

Abstract

No Dimensional Sampling Coresets for Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (78)