Table of Contents
Fetching ...

Majority-of-Three: The Simplest Optimal Learner?

Ishaq Aden-Ali, Mikael Møller Høgsgaard, Kasper Green Larsen, Nikita Zhivotovskiy

TL;DR

This work shows that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier, and proves a near-optimal high-probability bound on this algorithm's error.

Abstract

Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets of the data. It is thus a natural goal to determine the simplest algorithm that is optimal. In this work we study the arguably simplest algorithm that could be optimal: returning the majority vote of three ERM classifiers. We show that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier. Furthermore, we prove a near-optimal high-probability bound on this algorithm's error. We conjecture that a better analysis will prove that this algorithm is in fact optimal in the high-probability regime.

Majority-of-Three: The Simplest Optimal Learner?

TL;DR

This work shows that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier, and proves a near-optimal high-probability bound on this algorithm's error.

Abstract

Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets of the data. It is thus a natural goal to determine the simplest algorithm that is optimal. In this work we study the arguably simplest algorithm that could be optimal: returning the majority vote of three ERM classifiers. We show that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier. Furthermore, we prove a near-optimal high-probability bound on this algorithm's error. We conjecture that a better analysis will prove that this algorithm is in fact optimal in the high-probability regime.
Paper Structure (12 sections, 16 theorems, 90 equations, 1 figure)

This paper contains 12 sections, 16 theorems, 90 equations, 1 figure.

Key Result

Theorem 1.0

Fix a function class $\mathcal{F} \subseteq \{0,1\}^{\mathcal{X}}$ with VC dimension $d$. Fix a distribution $P$ over $\mathcal{X}$ and target function $f^\star \in \mathcal{F}$. For any ERM algorithm $\widehat{f} : \mathcal{X} \times \mathcal{Z}^* \to \{0,1\}$ it follows that

Figures (1)

  • Figure 1: An illustration of the partitioning of the interval $(0,1]$ for a training sample consisting of $m = 18$ points with $d=2$. The interval $(0,1]$ is partitioned into $4$ intervals $I_{1}, \dots, I_{4}$. Each interval $I_{i}$ is further partitioned into the $4$ subintervals $I_{(i,1)}, \dots, I_{(i,4)}$. The red points correspond to the first half of the sample $(X_1, \dots, X_{9} )$ and the blue points correspond to the second half of the sample $(X_{10}, \dots, X_{18} )$. The yellow highlighted regions are the first $d$ intervals $I_2$ and $I_4$ that contain no points from $(X_1, \dots, X_{9})$. The green highlighted regions are the first $d$ subintervals of $I_2$ and $I_4$ that contain no points from $(X_{10}, \dots, X_{18})$. The green intervals are added to the union of intervals used by $\widehat{f}_S$ as their indices correspond to the set $L_1(S)$.

Theorems & Definitions (26)

  • Theorem 1.0
  • Theorem 1.0
  • Conjecture 1.1
  • Theorem 1.1
  • Theorem 2.0
  • Lemma 2.1
  • proof : Proof of \ref{['expextationboundsection:theorem']}
  • Lemma 2.1
  • proof : Proof \ref{['lem:joint_mistake']}
  • Theorem 3.0
  • ...and 16 more