Table of Contents
Fetching ...

The Sample Complexity of Replicable Realizable PAC Learning

Kasper Green Larsen, Markus Engelund Mathiasen, Chirag Pabbaraju, Clement Svendsen

TL;DR

A particularly hard learning problem is constructed and a sample complexity lower bound is shown with a close to $(\log|H|)^{3/2}$ dependence on the size of the hypothesis class $H$.

Abstract

In this paper, we consider the problem of replicable realizable PAC learning. We construct a particularly hard learning problem and show a sample complexity lower bound with a close to $(\log|H|)^{3/2}$ dependence on the size of the hypothesis class $H$. Our proof uses several novel techniques and works by defining a particular Cayley graph associated with $H$ and analyzing a suitable random walk on this graph by examining the spectral properties of its adjacency matrix. Furthermore, we show an almost matching upper bound for the lower bound instance, meaning if a stronger lower bound exists, one would have to consider a different instance of the problem.

The Sample Complexity of Replicable Realizable PAC Learning

TL;DR

A particularly hard learning problem is constructed and a sample complexity lower bound is shown with a close to dependence on the size of the hypothesis class .

Abstract

In this paper, we consider the problem of replicable realizable PAC learning. We construct a particularly hard learning problem and show a sample complexity lower bound with a close to dependence on the size of the hypothesis class . Our proof uses several novel techniques and works by defining a particular Cayley graph associated with and analyzing a suitable random walk on this graph by examining the spectral properties of its adjacency matrix. Furthermore, we show an almost matching upper bound for the lower bound instance, meaning if a stronger lower bound exists, one would have to consider a different instance of the problem.
Paper Structure (25 sections, 31 theorems, 125 equations, 1 figure)

This paper contains 25 sections, 31 theorems, 125 equations, 1 figure.

Key Result

Theorem 1.2

For any integer $d \geq 10^{11}$, and positive reals $\varepsilon,\delta,\rho \leq 10^{-4}$, there exists a domain ${\mathcal{X}}$, a hypothesis class ${\mathcal{H}} \subseteq\{0, 1\}^{\mathcal{X}}$ with VC-dimension $d$, such that for any algorithm ${\mathcal{A}}$ there is a distribution ${\mathcal labeled samples from ${\mathcal{D}}$ in order to be a $\rho$-replicable PAC learner for ${\mathcal{

Figures (1)

  • Figure 1: Example with $k = 7$ for hypothesis $h_i$ for $i = (0, 5, 2)$. Each interval shows which values of $b$ will make $h_i((a, b)) = 1$. For instance, if $a = 1$ then $h_i((a, b)) = 1$ for $b \in \{0, 5, 6\}$.

Theorems & Definitions (55)

  • Definition 1.1: $\rho$-replicability reproducibility_in_learning
  • Theorem 1.2: Replicable Learning Lower Bound
  • Theorem 1.3: Replicable Learning Upper Bound
  • Theorem 3.1: Replicable Learning Lower Bound
  • Definition 3.1: Mode
  • Lemma 3.1
  • Theorem 3.2
  • Lemma 3.2
  • Lemma 3.3: Random step
  • Lemma 3.3
  • ...and 45 more