Limitations of Membership Queries in Testable Learning
Jane Lange, Mingda Qiao
TL;DR
The paper investigates the limits of membership queries in testable learning (TL-Q). It proves that MQs do not asymptotically beat sample-based distribution-specific learning in time, by giving a general reduction from refutation to TL-Q and introducing MQ-SQ with SQ-dimension-based lower bounds. The authors show TL-Q implies SQ refutation, and, combined with SQ lower bounds for natural classes (parities, juntas, decision trees), conclude that efficient MQ-based TL-Q cannot be made testable. They also connect these results to learning via refutation, and apply them to juntas and related structures, outlining both the strength and limits of MQ-based approaches. Overall, the work establishes a tight link between refutation complexity, TL-Q, and SQ-dimension, constraining the potential benefits of membership-query-driven speedups in testable learning.
Abstract
Membership queries (MQ) often yield speedups for learning tasks, particularly in the distribution-specific setting. We show that in the \emph{testable learning} model of Rubinfeld and Vasilyan [RV23], membership queries cannot decrease the time complexity of testable learning algorithms beyond the complexity of sample-only distribution-specific learning. In the testable learning model, the learner must output a hypothesis whenever the data distribution satisfies a desired property, and if it outputs a hypothesis, the hypothesis must be near-optimal. We give a general reduction from sample-based \emph{refutation} of boolean concept classes, as presented in [Vadhan17, KL18], to testable learning with queries (TL-Q). This yields lower bounds for TL-Q via the reduction from learning to refutation given in [KL18]. The result is that, relative to a concept class and a distribution family, no $m$-sample TL-Q algorithm can be super-polynomially more time-efficient than the best $m$-sample PAC learner. Finally, we define a class of ``statistical'' MQ algorithms that encompasses many known distribution-specific MQ learners, such as those based on influence estimation or subcube-conditional statistical queries. We show that TL-Q algorithms in this class imply efficient statistical-query refutation and learning algorithms. Thus, combined with known SQ dimension lower bounds, our results imply that these efficient membership query learners cannot be made testable.
