Table of Contents
Fetching ...

Samplability makes learning easier

Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, Li-Yang Tan

TL;DR

The paper challenges the default PAC learning assumption by focusing on samplable distributions, showing that practical data-generating processes can dramatically reduce sample and time requirements. It introduces explicit evasive sets as a central tool, proving a statistical separation (exponential VC dimension yet polynomial learnability under samplable distributions) and a computational separation (under standard cryptographic assumptions, a class easy in samplable PAC but hard in standard PAC; a random-oracle variant reinforces the claim). The results extend to online learning against efficient adversaries and reveal a rich landscape of separations within samplable PAC, including explicit constructions via pseudorandom generators. Overall, the work suggests that incorporating samplability provides a more realistic, nuanced understanding of learning complexity and motivates characterizations that jointly consider function and distribution complexity. It also highlights deep connections between learning, sampling, and cryptographic hardness, with implications for both theory and practical learning scenarios.

Abstract

The standard definition of PAC learning (Valiant 1984) requires learners to succeed under all distributions -- even ones that are intractable to sample from. This stands in contrast to samplable PAC learning (Blum, Furst, Kearns, and Lipton 1993), where learners only have to succeed under samplable distributions. We study this distinction and show that samplable PAC substantially expands the power of efficient learners. We first construct a concept class that requires exponential sample complexity in standard PAC but is learnable with polynomial sample complexity in samplable PAC. We then lift this statistical separation to the computational setting and obtain a separation relative to a random oracle. Our proofs center around a new complexity primitive, explicit evasive sets, that we introduce and study. These are sets for which membership is easy to determine but are extremely hard to sample from. Our results extend to the online setting to similarly show how its landscape changes when the adversary is assumed to be efficient instead of computationally unbounded.

Samplability makes learning easier

TL;DR

The paper challenges the default PAC learning assumption by focusing on samplable distributions, showing that practical data-generating processes can dramatically reduce sample and time requirements. It introduces explicit evasive sets as a central tool, proving a statistical separation (exponential VC dimension yet polynomial learnability under samplable distributions) and a computational separation (under standard cryptographic assumptions, a class easy in samplable PAC but hard in standard PAC; a random-oracle variant reinforces the claim). The results extend to online learning against efficient adversaries and reveal a rich landscape of separations within samplable PAC, including explicit constructions via pseudorandom generators. Overall, the work suggests that incorporating samplability provides a more realistic, nuanced understanding of learning complexity and motivates characterizations that jointly consider function and distribution complexity. It also highlights deep connections between learning, sampling, and cryptographic hardness, with implications for both theory and practical learning scenarios.

Abstract

The standard definition of PAC learning (Valiant 1984) requires learners to succeed under all distributions -- even ones that are intractable to sample from. This stands in contrast to samplable PAC learning (Blum, Furst, Kearns, and Lipton 1993), where learners only have to succeed under samplable distributions. We study this distinction and show that samplable PAC substantially expands the power of efficient learners. We first construct a concept class that requires exponential sample complexity in standard PAC but is learnable with polynomial sample complexity in samplable PAC. We then lift this statistical separation to the computational setting and obtain a separation relative to a random oracle. Our proofs center around a new complexity primitive, explicit evasive sets, that we introduce and study. These are sets for which membership is easy to determine but are extremely hard to sample from. Our results extend to the online setting to similarly show how its landscape changes when the adversary is assumed to be efficient instead of computationally unbounded.

Paper Structure

This paper contains 57 sections, 29 theorems, 81 equations, 3 figures.

Key Result

Theorem 1

There is a concept class with exponential VC dimension---and hence requires exponential sample complexity in standard PAC---but is learnable with polynomial sample complexity, and in fact even in polynomial time, in samplable PAC.

Figures (3)

  • Figure 1: The left and right plots illustrate how the sample complexities of the learning tasks in \ref{['thm:statistical intro', 'thm:separations within samplable PAC intro']} respectively scale with the complexity of the distribution. See their formal versions for the quantitative parameters.
  • Figure 2: The weight distribution of $\mathcal{D}$ is illustrated by the 3 pink rectangles. The sizes of these rectangles depict the number of points and their shades depict the amount of weight. Since $\mathcal{D}$ places $0.9+\varepsilon$ weight on $H$, it $(0.9+\varepsilon)$-hits $H$. However, since $0.9$ amount of this weight is concentrated on the $k$ points in $H^*$, it $(\varepsilon',k)$-misses $H$ for any $\varepsilon' > \varepsilon$.
  • Figure 3: An illustration of how samplable distributions relate to real-world distributions.

Theorems & Definitions (82)

  • Theorem 1: See \ref{['thm:statistical formal']} for the formal version
  • Theorem 2: See \ref{['thm:computational-body']} for the formal version
  • Corollary 1.1: See \ref{['thm:computational-sep-with-random-oracle']} for the formal version
  • Remark 1.2: Evasive sets and uniform generation
  • Theorem 3: See \ref{['thm:separations within samplable PAC formal']} for the formal version
  • Definition 2.1: $\varepsilon$-miss
  • Definition 2.1: $(\eps,k)$-miss
  • Remark 2.2: Comparison with TV distance
  • Definition 2.2: $(\eps,k)$-evades size-$s$ distributions
  • Lemma 2.3: Existence of an evasive set
  • ...and 72 more