Table of Contents
Fetching ...

Is Efficient PAC Learning Possible with an Oracle That Responds 'Yes' or 'No'?

Constantinos Daskalakis, Noah Golowich

TL;DR

This work shows that empirical risk minimization is not strictly necessary for efficient learning in the PAC framework. By introducing a weaker, bit-valued oracle—first a weak consistency oracle and later a weak ERM oracle—the authors design oracle-efficient algorithms that learn realizable partial binary concepts with polynomial dependence on VC dimension, and extend these results to agnostic learning, multiclass via Natarajan dimension, and real-valued regression via fat-shattering dimension. The core technical approach combines a randomized orientation of the one-inclusion graph with boosting (Adaboost) and leverages sample compression for generalization, achieving near-optimal sample complexity up to polynomial factors in the respective complexity measures. The results illuminate a polynomial price to pay for oracle-inefficiency and open questions about tighter dependencies (e.g., closing the gap from $ ilde{O}(d_{VC}^3)$ to $O(d_{VC})$) and broader applicability to contextual bandits, online, and reinforcement learning. These findings have theoretical significance for query-efficient learning and potential practical impact in settings where gradient-based ERM is costly or impractical, highlighting how limited, information-sparse queries can suffice for learning with strong guarantees.

Abstract

The empirical risk minimization (ERM) principle has been highly impactful in machine learning, leading both to near-optimal theoretical guarantees for ERM-based learning algorithms as well as driving many of the recent empirical successes in deep learning. In this paper, we investigate the question of whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning: in particular, is there a weaker oracle than ERM which can nevertheless enable learnability? We answer this question affirmatively, showing that in the realizable setting of PAC learning for binary classification, a concept class can be learned using an oracle which only returns a single bit indicating whether a given dataset is realizable by some concept in the class. The sample complexity and oracle complexity of our algorithm depend polynomially on the VC dimension of the hypothesis class, thus showing that there is only a polynomial price to pay for use of our weaker oracle. Our results extend to the agnostic learning setting with a slight strengthening of the oracle, as well as to the partial concept, multiclass and real-valued learning settings. In the setting of partial concept classes, prior to our work no oracle-efficient algorithms were known, even with a standard ERM oracle. Thus, our results address a question of Alon et al. (2021) who asked whether there are algorithmic principles which enable efficient learnability in this setting.

Is Efficient PAC Learning Possible with an Oracle That Responds 'Yes' or 'No'?

TL;DR

This work shows that empirical risk minimization is not strictly necessary for efficient learning in the PAC framework. By introducing a weaker, bit-valued oracle—first a weak consistency oracle and later a weak ERM oracle—the authors design oracle-efficient algorithms that learn realizable partial binary concepts with polynomial dependence on VC dimension, and extend these results to agnostic learning, multiclass via Natarajan dimension, and real-valued regression via fat-shattering dimension. The core technical approach combines a randomized orientation of the one-inclusion graph with boosting (Adaboost) and leverages sample compression for generalization, achieving near-optimal sample complexity up to polynomial factors in the respective complexity measures. The results illuminate a polynomial price to pay for oracle-inefficiency and open questions about tighter dependencies (e.g., closing the gap from to ) and broader applicability to contextual bandits, online, and reinforcement learning. These findings have theoretical significance for query-efficient learning and potential practical impact in settings where gradient-based ERM is costly or impractical, highlighting how limited, information-sparse queries can suffice for learning with strong guarantees.

Abstract

The empirical risk minimization (ERM) principle has been highly impactful in machine learning, leading both to near-optimal theoretical guarantees for ERM-based learning algorithms as well as driving many of the recent empirical successes in deep learning. In this paper, we investigate the question of whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning: in particular, is there a weaker oracle than ERM which can nevertheless enable learnability? We answer this question affirmatively, showing that in the realizable setting of PAC learning for binary classification, a concept class can be learned using an oracle which only returns a single bit indicating whether a given dataset is realizable by some concept in the class. The sample complexity and oracle complexity of our algorithm depend polynomially on the VC dimension of the hypothesis class, thus showing that there is only a polynomial price to pay for use of our weaker oracle. Our results extend to the agnostic learning setting with a slight strengthening of the oracle, as well as to the partial concept, multiclass and real-valued learning settings. In the setting of partial concept classes, prior to our work no oracle-efficient algorithms were known, even with a standard ERM oracle. Thus, our results address a question of Alon et al. (2021) who asked whether there are algorithmic principles which enable efficient learnability in this setting.
Paper Structure (43 sections, 29 theorems, 79 equations, 8 algorithms)

This paper contains 43 sections, 29 theorems, 79 equations, 8 algorithms.

Key Result

Lemma 2.1

There is a constant $C > 0$ so that the following holds. Consider any domain $\mathcal{X}$ and label set $\mathcal{Y}$, together with a loss function $\ell : \mathcal{Y} \times \mathcal{Y} \to [0,1]$. For any compression scheme $(\kappa, \rho)$, for any $n \in \mathbb N$, and $\delta \in (0,1)$, for In particular, if $\widehat{\mathrm{er}}_{S, \ell}(\rho(\kappa(S))) = 0$, then

Theorems & Definitions (67)

  • Definition 2.1: Oracle-efficient PAC learning
  • Definition 2.2: Oracle-efficient agnostic PAC learning
  • Definition 2.3: Weak consistency oracle
  • Definition 2.4: Range consistency oracle
  • Definition 2.5: Weak ERM oracle
  • Definition 2.6: Strong ERM oracle
  • Definition 2.7: Sample compression scheme; littlestone2003relatingdavid2016supervised
  • Lemma 2.1: Generalization-by-compression; Theorem 2.1 of david2016supervised
  • Definition 2.8: One-inclusion graph
  • Theorem 3.1: Oracle-efficient partial concept class learning
  • ...and 57 more