Table of Contents
Fetching ...

Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses

Mathieu Bazinet, Valentina Zantedeschi, Pascal Germain

TL;DR

This work addresses the challenge of obtaining generalization guarantees for real-valued and unbounded losses within the sample-compression paradigm. It develops a general PAC-Bayes–inspired bound for real-valued losses using a comparator framework, yielding Catoni-type and KL-based bounds, plus a sub-Gaussian unbounded-loss bound, all independently of the model size. Central to the approach is Pick-To-Learn (P2L), a model-agnostic meta-algorithm that converts any predictor into a sample-compressed predictor by incrementally building a compression set and retraining, enabling tight, data-efficient generalization certificates on deep nets, random forests, and NLP models like DistilBERT. The empirical results across Binary MNIST, MNIST, regression with trees, and Amazon polarity demonstrate non-vacuous, tight bounds that scale with compression size rather than parameter count, highlighting the practical impact of certificate-based generalization in real-valued loss settings.

Abstract

The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the training dataset and a (short) message string, generally defined as a binary sequence. Previous works provided generalization bounds for the zero-one loss, which is restrictive notably when applied to deep learning approaches. In this paper, we present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. Using the Pick-To-Learn (P2L) meta-algorithm, which transforms the training method of any machine-learning predictor to yield sample-compressed predictors, we empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.

Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses

TL;DR

This work addresses the challenge of obtaining generalization guarantees for real-valued and unbounded losses within the sample-compression paradigm. It develops a general PAC-Bayes–inspired bound for real-valued losses using a comparator framework, yielding Catoni-type and KL-based bounds, plus a sub-Gaussian unbounded-loss bound, all independently of the model size. Central to the approach is Pick-To-Learn (P2L), a model-agnostic meta-algorithm that converts any predictor into a sample-compressed predictor by incrementally building a compression set and retraining, enabling tight, data-efficient generalization certificates on deep nets, random forests, and NLP models like DistilBERT. The empirical results across Binary MNIST, MNIST, regression with trees, and Amazon polarity demonstrate non-vacuous, tight bounds that scale with compression size rather than parameter count, highlighting the practical impact of certificate-based generalization in real-valued loss settings.

Abstract

The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the training dataset and a (short) message string, generally defined as a binary sequence. Previous works provided generalization bounds for the zero-one loss, which is restrictive notably when applied to deep learning approaches. In this paper, we present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. Using the Pick-To-Learn (P2L) meta-algorithm, which transforms the training method of any machine-learning predictor to yield sample-compressed predictors, we empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.
Paper Structure (28 sections, 20 theorems, 78 equations, 2 figures, 14 tables, 2 algorithms)

This paper contains 28 sections, 20 theorems, 78 equations, 2 figures, 14 tables, 2 algorithms.

Key Result

Theorem 1

For any distribution $\mathop{\mathrm{\mathcal{D}}}\nolimits$ over $\mathop{\mathrm{\mathcal{X}}}\nolimits \times \mathop{\mathrm{\mathcal{Y}}}\nolimits$, for any family of set of messages $\{M(\mathop{\mathrm{\mathbf{i}}}\nolimits)\, | \mathop{\mathrm{\mathbf{i}}}\nolimits \in \mathop{\mathrm{\math with $\kappa = |\mathop{\mathrm{\mathbf{i}}}\nolimits^c|\widehat{R}_{S_{\mathop{\mathrm{\mathbf{i}}

Figures (2)

  • Figure 1: Illustration of the behavior of the $\mathrm{kl}$ bound throughout P2L iterations for the five different seeds of the hyperparameter combination that achieved the minimal P2L bound on MNIST49 and MNIST56. We mark the minimal $\mathrm{kl}$ bound for each seed with a diamond ($\blacklozenge$). The results for the other datasets can be found in \ref{['fig:early_stop_appendix']}.
  • Figure 2: Illustration of the behavior of the $\mathrm{kl}$ bound throughout P2L iterations for the five different random seed initializations sharing the hyperparameter combination that achieved the minimal averaged P2L bound. The diamonds ($\blacklozenge$) mark the minimal $\mathrm{kl}$ bound for each run.

Theorems & Definitions (30)

  • Example : shah2011feature
  • Theorem 1: shah_margin-sparsity_2005, Theorem 1
  • Theorem 2: paccagnan_pick_learn_2023, Theorem 4.2
  • Theorem 3
  • proof : Proof Sketch
  • Corollary 3
  • Proposition 4: germain2009pac, Proposition 2.1
  • Corollary 4
  • Corollary 4
  • Theorem 5
  • ...and 20 more