Table of Contents
Fetching ...

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

Arthur da Cunha, Kasper Green Larsen, Martin Ritzert

TL;DR

The paper tackles the problem of improving the sample efficiency of voting classifiers produced by boosting, addressing a long-standing gap where prior bounds for weak-to-strong learning had two logarithmic factors in the sample size $n$. It introduces Sampling Boosting, a randomized boosting algorithm that trains weak learners on many small subsamples and forms the final classifier by averaging, achieving a generalization bound with a single logarithmic dependency on $n$, i.e., $R_\mathcal{D}(\mathbf{g}) \le C \min\{ \frac{(d+\ln(1/\gamma))\ln(n/\delta)}{\gamma^4 n}, \frac{d \ln(n/d) \ln n}{\gamma^2 n} + \frac{\ln(1/\delta)}{n} \}$. To support this, the authors develop a novel randomized compression framework that extends classical sample compression to randomized encodings and proves generalization bounds that scale with the size of the compression, $s_n$, especially when stability is assumed. They further show how the new framework yields a corresponding randomized compression scheme for the boosting algorithm, prove a small failure probability via margin arguments, and establish stability to derive tight generalization guarantees. The work thus provides a principled path to near-optimal weak-to-strong learning for voting classifiers and introduces techniques that may generalize to other randomized training procedures. The results have implications for theory-driven design of efficient boosting methods and for understanding how sub-sampling interacts with ensemble generalization.

Abstract

In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: The best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

TL;DR

The paper tackles the problem of improving the sample efficiency of voting classifiers produced by boosting, addressing a long-standing gap where prior bounds for weak-to-strong learning had two logarithmic factors in the sample size . It introduces Sampling Boosting, a randomized boosting algorithm that trains weak learners on many small subsamples and forms the final classifier by averaging, achieving a generalization bound with a single logarithmic dependency on , i.e., . To support this, the authors develop a novel randomized compression framework that extends classical sample compression to randomized encodings and proves generalization bounds that scale with the size of the compression, , especially when stability is assumed. They further show how the new framework yields a corresponding randomized compression scheme for the boosting algorithm, prove a small failure probability via margin arguments, and establish stability to derive tight generalization guarantees. The work thus provides a principled path to near-optimal weak-to-strong learning for voting classifiers and introduces techniques that may generalize to other randomized training procedures. The results have implications for theory-driven design of efficient boosting methods and for understanding how sub-sampling interacts with ensemble generalization.

Abstract

In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: The best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.
Paper Structure (14 sections, 5 theorems, 25 equations)

This paper contains 14 sections, 5 theorems, 25 equations.

Key Result

Theorem 1.1

There exists universal constant $C > 0$ for which the following holds. Let $\mathcal{D}$ be an unknown distribution over $\mathcal{X} \times \{-1,1\}$ and let $\mathbf{S} \sim \mathcal{D}^n$. Then for every $\delta>0$, it holds with probability at least $1-\delta$ over $\mathbf{S}$ and the randomnes

Theorems & Definitions (8)

  • Theorem 1.1
  • Theorem 1.2
  • proof : Proof of Theorem \ref{['thm:genstable']}
  • Lemma 3.1
  • Theorem 3.2: llstalagrandvapnik71uniform
  • proof : of Lemma \ref{['lem:fmargin']}
  • Lemma 3.3
  • proof