The Many Faces of Optimal Weak-to-Strong Learning

Mikael Møller Høgsgaard; Kasper Green Larsen; Markus Engelund Mathiasen

The Many Faces of Optimal Weak-to-Strong Learning

Mikael Møller Høgsgaard, Kasper Green Larsen, Markus Engelund Mathiasen

TL;DR

This work presents a new and surprisingly simple Boosting algorithm that obtains a provably optimal sample complexity and suggests that this new algorithm might outperform previous algorithms on large data sets.

Abstract

Boosting is an extremely successful idea, allowing one to combine multiple low accuracy classifiers into a much more accurate voting classifier. In this work, we present a new and surprisingly simple Boosting algorithm that obtains a provably optimal sample complexity. Sample optimal Boosting algorithms have only recently been developed, and our new algorithm has the fastest runtime among all such algorithms and is the simplest to describe: Partition your training data into 5 disjoint pieces of equal size, run AdaBoost on each, and combine the resulting classifiers via a majority vote. In addition to this theoretical contribution, we also perform the first empirical comparison of the proposed sample optimal Boosting algorithms. Our pilot empirical study suggests that our new algorithm might outperform previous algorithms on large data sets.

The Many Faces of Optimal Weak-to-Strong Learning

TL;DR

Abstract

Paper Structure (12 sections, 5 theorems, 30 equations, 2 figures, 4 algorithms)

This paper contains 12 sections, 5 theorems, 30 equations, 2 figures, 4 algorithms.

Introduction
Weak-to-Strong Learning.
Sample Complexity.
Other Performance Metrics.
Our Contributions
Empirical Comparison.
Previous Optimal Weak-to-Strong Learners
Analysis of Majority-of-5
Formal Analysis
Preliminaries.
Analysis.
Experiments

Key Result

Theorem 1

For any distribution $\mathcal{D}$ over $\mathcal{X} \times \{-1,1\}$ and any $\gamma$-weak learner $\mathcal{W}$ using a hypothesis set $\mathcal{H}$ of VC-dimension $d$, it holds for a training set $S \sim \mathcal{D}^m$ that running Majority-of-5 on $S$ to obtain a hypothesis $g$ satisfies

Figures (2)

Figure 1: Top is Higgs, Left plot is Boone. Right plot is Forest Cover
Figure 2: Left plot is Diabetes. Right plot is Adversarial

Theorems & Definitions (8)

Theorem 1
Corollary 2
Lemma 2
proof : Proof of Corollary \ref{['cor:upperbound']}
Lemma 2
proof : Proof of \ref{['lem16']}
Corollary 2
proof : Proof of \ref{['lem15']}

The Many Faces of Optimal Weak-to-Strong Learning

TL;DR

Abstract

The Many Faces of Optimal Weak-to-Strong Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (8)