Boosting, Voting Classifiers and Randomized Sample Compression Schemes

Arthur da Cunha; Kasper Green Larsen; Martin Ritzert

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

Arthur da Cunha, Kasper Green Larsen, Martin Ritzert

TL;DR

The paper tackles the problem of improving the sample efficiency of voting classifiers produced by boosting, addressing a long-standing gap where prior bounds for weak-to-strong learning had two logarithmic factors in the sample size $n$. It introduces Sampling Boosting, a randomized boosting algorithm that trains weak learners on many small subsamples and forms the final classifier by averaging, achieving a generalization bound with a single logarithmic dependency on $n$, i.e., $R_\mathcal{D}(\mathbf{g}) \le C \min\{ \frac{(d+\ln(1/\gamma))\ln(n/\delta)}{\gamma^4 n}, \frac{d \ln(n/d) \ln n}{\gamma^2 n} + \frac{\ln(1/\delta)}{n} \}$. To support this, the authors develop a novel randomized compression framework that extends classical sample compression to randomized encodings and proves generalization bounds that scale with the size of the compression, $s_n$, especially when stability is assumed. They further show how the new framework yields a corresponding randomized compression scheme for the boosting algorithm, prove a small failure probability via margin arguments, and establish stability to derive tight generalization guarantees. The work thus provides a principled path to near-optimal weak-to-strong learning for voting classifiers and introduces techniques that may generalize to other randomized training procedures. The results have implications for theory-driven design of efficient boosting methods and for understanding how sub-sampling interacts with ensemble generalization.

Abstract

In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: The best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

TL;DR

. It introduces Sampling Boosting, a randomized boosting algorithm that trains weak learners on many small subsamples and forms the final classifier by averaging, achieving a generalization bound with a single logarithmic dependency on

, i.e.,

. To support this, the authors develop a novel randomized compression framework that extends classical sample compression to randomized encodings and proves generalization bounds that scale with the size of the compression,

, especially when stability is assumed. They further show how the new framework yields a corresponding randomized compression scheme for the boosting algorithm, prove a small failure probability via margin arguments, and establish stability to derive tight generalization guarantees. The work thus provides a principled path to near-optimal weak-to-strong learning for voting classifiers and introduces techniques that may generalize to other randomized training procedures. The results have implications for theory-driven design of efficient boosting methods and for understanding how sub-sampling interacts with ensemble generalization.

Abstract

Paper Structure (14 sections, 5 theorems, 25 equations)

This paper contains 14 sections, 5 theorems, 25 equations.

Introduction
Weak-to-Strong Learning.
Contribution I: A New Voting Classifier.
Sample Compression Schemes
Contribution II: Randomized Compression Schemes.
Main Ideas in Algorithm \ref{['alg:adaboost']}
Other Related Work
Preliminaries
Generalization via Randomized Compression
Efficient Boosting via Randomized Compression
Corresponding Randomized Compression Scheme
Small Failure Probability
Stability
Conclusion

Key Result

Theorem 1.1

There exists universal constant $C > 0$ for which the following holds. Let $\mathcal{D}$ be an unknown distribution over $\mathcal{X} \times \{-1,1\}$ and let $\mathbf{S} \sim \mathcal{D}^n$. Then for every $\delta>0$, it holds with probability at least $1-\delta$ over $\mathbf{S}$ and the randomnes

Theorems & Definitions (8)

Theorem 1.1
Theorem 1.2
proof : Proof of Theorem \ref{['thm:genstable']}
Lemma 3.1
Theorem 3.2: llstalagrandvapnik71uniform
proof : of Lemma \ref{['lem:fmargin']}
Lemma 3.3
proof

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

TL;DR

Abstract

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (8)