On the Power of Interactive Proofs for Learning
Tom Gur, Mohammad Mahdi Jahanara, Mohammad Mahdi Khodabandeh, Ninad Rajgopal, Bahar Salamatian, Igor Shinkar
TL;DR
This work advances the theory of verifying learning outcomes via interactive proofs by introducing doubly-efficient PAC-verification protocols for fundamental Boolean-function classes. The authors develop sample-efficient interactive methods to identify heavy Fourier characters, verify agnostic-learning results for constant-depth circuits $AC^0[2]$, and verify $k$-juntas under the uniform distribution, all while keeping the verifier’s sample complexity polylogarithmic in relevant parameters and the prover’s runtime quasi-polynomial or polynomial in the input size. A core contribution is a set of general transformations that convert tolerant testers and membership-query-based learners into PAC-verifiers that require far fewer samples than running the learner itself, aided by tools like the Nisan-Wigderson reconstruction, distance estimators, and embeding-based query-to-sample reductions. The paper also demonstrates the power of allowing unbounded provers, yielding distribution-free PAC-verification for arbitrary classes with minimal labeled data, thereby highlighting the potential of interactive proofs to substantially reduce data and compute requirements in learning verification. Overall, the results illuminate new pathways for validating machine-learning outcomes efficiently and robustly against powerful but untrusted verifiers and provers, with implications for both theory and practice in verifiable AI systems.
Abstract
We continue the study of doubly-efficient proof systems for verifying agnostic PAC learning, for which we obtain the following results. - We construct an interactive protocol for learning the $t$ largest Fourier characters of a given function $f \colon \{0,1\}^n \to \{0,1\}$ up to an arbitrarily small error, wherein the verifier uses $\mathsf{poly}(t)$ random examples. This improves upon the Interactive Goldreich-Levin protocol of Goldwasser, Rothblum, Shafer, and Yehudayoff (ITCS 2021) whose sample complexity is $\mathsf{poly}(t,n)$. - For agnostically learning the class $\mathsf{AC}^0[2]$ under the uniform distribution, we build on the work of Carmosino, Impagliazzo, Kabanets, and Kolokolova (APPROX/RANDOM 2017) and design an interactive protocol, where given a function $f \colon \{0,1\}^n \to \{0,1\}$, the verifier learns the closest hypothesis up to $\mathsf{polylog}(n)$ multiplicative factor, using quasi-polynomially many random examples. In contrast, this class has been notoriously resistant even for constructing realisable learners (without a prover) using random examples. - For agnostically learning $k$-juntas under the uniform distribution, we obtain an interactive protocol, where the verifier uses $O(2^k)$ random examples to a given function $f \colon \{0,1\}^n \to \{0,1\}$. Crucially, the sample complexity of the verifier is independent of $n$. We also show that if we do not insist on doubly-efficient proof systems, then the model becomes trivial. Specifically, we show a protocol for an arbitrary class $\mathcal{C}$ of Boolean functions in the distribution-free setting, where the verifier uses $O(1)$ labeled examples to learn $f$.
