Samplability makes learning easier
Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, Li-Yang Tan
TL;DR
The paper challenges the default PAC learning assumption by focusing on samplable distributions, showing that practical data-generating processes can dramatically reduce sample and time requirements. It introduces explicit evasive sets as a central tool, proving a statistical separation (exponential VC dimension yet polynomial learnability under samplable distributions) and a computational separation (under standard cryptographic assumptions, a class easy in samplable PAC but hard in standard PAC; a random-oracle variant reinforces the claim). The results extend to online learning against efficient adversaries and reveal a rich landscape of separations within samplable PAC, including explicit constructions via pseudorandom generators. Overall, the work suggests that incorporating samplability provides a more realistic, nuanced understanding of learning complexity and motivates characterizations that jointly consider function and distribution complexity. It also highlights deep connections between learning, sampling, and cryptographic hardness, with implications for both theory and practical learning scenarios.
Abstract
The standard definition of PAC learning (Valiant 1984) requires learners to succeed under all distributions -- even ones that are intractable to sample from. This stands in contrast to samplable PAC learning (Blum, Furst, Kearns, and Lipton 1993), where learners only have to succeed under samplable distributions. We study this distinction and show that samplable PAC substantially expands the power of efficient learners. We first construct a concept class that requires exponential sample complexity in standard PAC but is learnable with polynomial sample complexity in samplable PAC. We then lift this statistical separation to the computational setting and obtain a separation relative to a random oracle. Our proofs center around a new complexity primitive, explicit evasive sets, that we introduce and study. These are sets for which membership is easy to determine but are extremely hard to sample from. Our results extend to the online setting to similarly show how its landscape changes when the adversary is assumed to be efficient instead of computationally unbounded.
