Table of Contents
Fetching ...

Efficient Optimal PAC Learning

Mikael Møller Høgsgaard

TL;DR

This work analyzes the computational costs of optimal PAC learners in the realizable setting with finite VC-dimension $d$. It introduces an Efficient Optimal PAC Learner that uses a randomized AdaBoost-based subsampling scheme and ERM as a subroutine, achieving an optimal generalization bound $\mathcal{L}_{\mathcal{D}_c}(\hat{A})=O\big((d+\ln(1/\delta))/m\big)$ while attaining near-linear training time and logarithmic inference cost in $m$. Relative to prior optimal learners that rely on deterministic subsampling or bagging, the proposed method reduces inference complexity and provides a refined cost-structure through AdaBoostSample-based voting over a carefully structured subsampling matrix $\mathcal{S}$. The approach leverages uniform convergence and margin-based analyses to guarantee PAC optimality under distribution-free settings, offering a scalable pathway for practical PAC learning where ERM costs dominate.

Abstract

Recent advances in the binary classification setting by Hanneke [2016b] and Larsen [2023] have resulted in optimal PAC learners. These learners leverage, respectively, a clever deterministic subsampling scheme and the classic heuristic of bagging Breiman [1996]. Both optimal PAC learners use, as a subroutine, the natural algorithm of empirical risk minimization. Consequently, the computational cost of these optimal PAC learners is tied to that of the empirical risk minimizer algorithm. In this work, we seek to provide an alternative perspective on the computational cost imposed by the link to the empirical risk minimizer algorithm. To this end, we show the existence of an optimal PAC learner, which offers a different tradeoff in terms of the computational cost induced by the empirical risk minimizer.

Efficient Optimal PAC Learning

TL;DR

This work analyzes the computational costs of optimal PAC learners in the realizable setting with finite VC-dimension . It introduces an Efficient Optimal PAC Learner that uses a randomized AdaBoost-based subsampling scheme and ERM as a subroutine, achieving an optimal generalization bound while attaining near-linear training time and logarithmic inference cost in . Relative to prior optimal learners that rely on deterministic subsampling or bagging, the proposed method reduces inference complexity and provides a refined cost-structure through AdaBoostSample-based voting over a carefully structured subsampling matrix . The approach leverages uniform convergence and margin-based analyses to guarantee PAC optimality under distribution-free settings, offering a scalable pathway for practical PAC learning where ERM costs dominate.

Abstract

Recent advances in the binary classification setting by Hanneke [2016b] and Larsen [2023] have resulted in optimal PAC learners. These learners leverage, respectively, a clever deterministic subsampling scheme and the classic heuristic of bagging Breiman [1996]. Both optimal PAC learners use, as a subroutine, the natural algorithm of empirical risk minimization. Consequently, the computational cost of these optimal PAC learners is tied to that of the empirical risk minimizer algorithm. In this work, we seek to provide an alternative perspective on the computational cost imposed by the link to the empirical risk minimizer algorithm. To this end, we show the existence of an optimal PAC learner, which offers a different tradeoff in terms of the computational cost induced by the empirical risk minimizer.

Paper Structure

This paper contains 21 sections, 23 theorems, 104 equations, 1 figure, 7 algorithms.

Key Result

Lemma 1

[vapnik74theory, Blumeruniformconvergence from Simons[Theorem 2]] For $0<\delta,\varepsilon<1$, hypothesis class $\mathcal{H}$ of VC-dimension $d$, target concept $c\in\mathcal{H}$, and distribution $\mathcal{D}$ over $\mathcal{X}$, we have with probability at least $1-\delta$ over $\mathbf{S}\sim \

Figures (1)

  • Figure 1: The figure illustrates outputs of the boosting algorithm on $8$ structured sub training sequences, each producing a majority vote consisting of $8$ voters depicted by lines coming out of $\mathcal{B}(\mathbf{S}_{i})$ (to not overload the figure we only include the name of $h_{1,1}$ and $h_{1,8}$ on the lines). For most new examples $(\mathbf{x},\mathbf{y})$ with probability at least $3/4$ over the draw of $\mathbf{S}_{i}$ the majority vote $\mathcal{B}(\mathbf{S}_{i})$ has $3/4$ of it voters correct. Boosting calls $\mathcal{B}(\mathbf{S}_{i})$ with a green check mark over has $3/4$ of its voters correct on the new example $(\mathbf{x},\mathbf{y})$, else a red cross. Lines with a green checkmark at the end correspond to a voter being correct on $(\mathbf{x},\mathbf{y})$, and if incorrect a red cross. For instance the call $\mathcal{B}(\mathbf{S}_{1})$ has all of it voters expect $h_{1,8}(\mathbf{x})$ being equal to $\mathbf{y}$, thus $\mathcal{B}(\mathbf{S}_{1})(\mathbf{x})$ has a green check mark as $7/8\geq 3/4$ of its voters are correct.

Theorems & Definitions (34)

  • Definition 1
  • Lemma 1
  • Theorem 1: Informal statement of \ref{['maintheorem']}
  • Lemma 2
  • Lemma 3
  • Lemma 4: \ref{['induktionsteplemma']} with $T=\emptyset$
  • Lemma 5
  • Theorem 2
  • proof : Proof of \ref{['maintheorem']}.
  • Lemma 6
  • ...and 24 more