Table of Contents
Fetching ...

Limitations of Membership Queries in Testable Learning

Jane Lange, Mingda Qiao

TL;DR

The paper investigates the limits of membership queries in testable learning (TL-Q). It proves that MQs do not asymptotically beat sample-based distribution-specific learning in time, by giving a general reduction from refutation to TL-Q and introducing MQ-SQ with SQ-dimension-based lower bounds. The authors show TL-Q implies SQ refutation, and, combined with SQ lower bounds for natural classes (parities, juntas, decision trees), conclude that efficient MQ-based TL-Q cannot be made testable. They also connect these results to learning via refutation, and apply them to juntas and related structures, outlining both the strength and limits of MQ-based approaches. Overall, the work establishes a tight link between refutation complexity, TL-Q, and SQ-dimension, constraining the potential benefits of membership-query-driven speedups in testable learning.

Abstract

Membership queries (MQ) often yield speedups for learning tasks, particularly in the distribution-specific setting. We show that in the \emph{testable learning} model of Rubinfeld and Vasilyan [RV23], membership queries cannot decrease the time complexity of testable learning algorithms beyond the complexity of sample-only distribution-specific learning. In the testable learning model, the learner must output a hypothesis whenever the data distribution satisfies a desired property, and if it outputs a hypothesis, the hypothesis must be near-optimal. We give a general reduction from sample-based \emph{refutation} of boolean concept classes, as presented in [Vadhan17, KL18], to testable learning with queries (TL-Q). This yields lower bounds for TL-Q via the reduction from learning to refutation given in [KL18]. The result is that, relative to a concept class and a distribution family, no $m$-sample TL-Q algorithm can be super-polynomially more time-efficient than the best $m$-sample PAC learner. Finally, we define a class of ``statistical'' MQ algorithms that encompasses many known distribution-specific MQ learners, such as those based on influence estimation or subcube-conditional statistical queries. We show that TL-Q algorithms in this class imply efficient statistical-query refutation and learning algorithms. Thus, combined with known SQ dimension lower bounds, our results imply that these efficient membership query learners cannot be made testable.

Limitations of Membership Queries in Testable Learning

TL;DR

The paper investigates the limits of membership queries in testable learning (TL-Q). It proves that MQs do not asymptotically beat sample-based distribution-specific learning in time, by giving a general reduction from refutation to TL-Q and introducing MQ-SQ with SQ-dimension-based lower bounds. The authors show TL-Q implies SQ refutation, and, combined with SQ lower bounds for natural classes (parities, juntas, decision trees), conclude that efficient MQ-based TL-Q cannot be made testable. They also connect these results to learning via refutation, and apply them to juntas and related structures, outlining both the strength and limits of MQ-based approaches. Overall, the work establishes a tight link between refutation complexity, TL-Q, and SQ-dimension, constraining the potential benefits of membership-query-driven speedups in testable learning.

Abstract

Membership queries (MQ) often yield speedups for learning tasks, particularly in the distribution-specific setting. We show that in the \emph{testable learning} model of Rubinfeld and Vasilyan [RV23], membership queries cannot decrease the time complexity of testable learning algorithms beyond the complexity of sample-only distribution-specific learning. In the testable learning model, the learner must output a hypothesis whenever the data distribution satisfies a desired property, and if it outputs a hypothesis, the hypothesis must be near-optimal. We give a general reduction from sample-based \emph{refutation} of boolean concept classes, as presented in [Vadhan17, KL18], to testable learning with queries (TL-Q). This yields lower bounds for TL-Q via the reduction from learning to refutation given in [KL18]. The result is that, relative to a concept class and a distribution family, no -sample TL-Q algorithm can be super-polynomially more time-efficient than the best -sample PAC learner. Finally, we define a class of ``statistical'' MQ algorithms that encompasses many known distribution-specific MQ learners, such as those based on influence estimation or subcube-conditional statistical queries. We show that TL-Q algorithms in this class imply efficient statistical-query refutation and learning algorithms. Thus, combined with known SQ dimension lower bounds, our results imply that these efficient membership query learners cannot be made testable.

Paper Structure

This paper contains 47 sections, 15 theorems, 101 equations, 1 algorithm.

Key Result

Theorem 1.2

If a concept class $\mathcal{C}$ is agnostically testably learnable with queries in time $t$ over a distribution $\mathcal{D}$, then it is agnostically learnable with random examples in time $\mathop{\mathrm{poly}}(t)$ over $\mathcal{D}$ as well.

Theorems & Definitions (43)

  • Theorem 1.2: \ref{['cor:tlq-agnostic']}, informal
  • Corollary 1.3
  • Theorem 1.4: \ref{['thm:MQ-SQ-lower-bound-in-SQ-DIM']}, informal
  • Definition 1.1: Exact refutation over the uniform distribution, informal
  • Conjecture 1
  • Definition 2.1: Distance of functions and distance to a concept class
  • Definition 2.2: $\eta$-refutation
  • Definition 2.3: Biased $(\alpha,\eta)$-refutation
  • Definition 2.4: Weak agnostic learning
  • Lemma 2.1: Learning by refutation: Lemma 6 of KL18
  • ...and 33 more