Boosting, Voting Classifiers and Randomized Sample Compression Schemes
Arthur da Cunha, Kasper Green Larsen, Martin Ritzert
TL;DR
The paper tackles the problem of improving the sample efficiency of voting classifiers produced by boosting, addressing a long-standing gap where prior bounds for weak-to-strong learning had two logarithmic factors in the sample size $n$. It introduces Sampling Boosting, a randomized boosting algorithm that trains weak learners on many small subsamples and forms the final classifier by averaging, achieving a generalization bound with a single logarithmic dependency on $n$, i.e., $R_\mathcal{D}(\mathbf{g}) \le C \min\{ \frac{(d+\ln(1/\gamma))\ln(n/\delta)}{\gamma^4 n}, \frac{d \ln(n/d) \ln n}{\gamma^2 n} + \frac{\ln(1/\delta)}{n} \}$. To support this, the authors develop a novel randomized compression framework that extends classical sample compression to randomized encodings and proves generalization bounds that scale with the size of the compression, $s_n$, especially when stability is assumed. They further show how the new framework yields a corresponding randomized compression scheme for the boosting algorithm, prove a small failure probability via margin arguments, and establish stability to derive tight generalization guarantees. The work thus provides a principled path to near-optimal weak-to-strong learning for voting classifiers and introduces techniques that may generalize to other randomized training procedures. The results have implications for theory-driven design of efficient boosting methods and for understanding how sub-sampling interacts with ensemble generalization.
Abstract
In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: The best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.
