On the Power of Interactive Proofs for Learning

Tom Gur; Mohammad Mahdi Jahanara; Mohammad Mahdi Khodabandeh; Ninad Rajgopal; Bahar Salamatian; Igor Shinkar

On the Power of Interactive Proofs for Learning

Tom Gur, Mohammad Mahdi Jahanara, Mohammad Mahdi Khodabandeh, Ninad Rajgopal, Bahar Salamatian, Igor Shinkar

TL;DR

This work advances the theory of verifying learning outcomes via interactive proofs by introducing doubly-efficient PAC-verification protocols for fundamental Boolean-function classes. The authors develop sample-efficient interactive methods to identify heavy Fourier characters, verify agnostic-learning results for constant-depth circuits $AC^0[2]$, and verify $k$-juntas under the uniform distribution, all while keeping the verifier’s sample complexity polylogarithmic in relevant parameters and the prover’s runtime quasi-polynomial or polynomial in the input size. A core contribution is a set of general transformations that convert tolerant testers and membership-query-based learners into PAC-verifiers that require far fewer samples than running the learner itself, aided by tools like the Nisan-Wigderson reconstruction, distance estimators, and embeding-based query-to-sample reductions. The paper also demonstrates the power of allowing unbounded provers, yielding distribution-free PAC-verification for arbitrary classes with minimal labeled data, thereby highlighting the potential of interactive proofs to substantially reduce data and compute requirements in learning verification. Overall, the results illuminate new pathways for validating machine-learning outcomes efficiently and robustly against powerful but untrusted verifiers and provers, with implications for both theory and practice in verifiable AI systems.

Abstract

We continue the study of doubly-efficient proof systems for verifying agnostic PAC learning, for which we obtain the following results. - We construct an interactive protocol for learning the $t$ largest Fourier characters of a given function $f \colon \{0,1\}^n \to \{0,1\}$ up to an arbitrarily small error, wherein the verifier uses $\mathsf{poly}(t)$ random examples. This improves upon the Interactive Goldreich-Levin protocol of Goldwasser, Rothblum, Shafer, and Yehudayoff (ITCS 2021) whose sample complexity is $\mathsf{poly}(t,n)$. - For agnostically learning the class $\mathsf{AC}^0[2]$ under the uniform distribution, we build on the work of Carmosino, Impagliazzo, Kabanets, and Kolokolova (APPROX/RANDOM 2017) and design an interactive protocol, where given a function $f \colon \{0,1\}^n \to \{0,1\}$, the verifier learns the closest hypothesis up to $\mathsf{polylog}(n)$ multiplicative factor, using quasi-polynomially many random examples. In contrast, this class has been notoriously resistant even for constructing realisable learners (without a prover) using random examples. - For agnostically learning $k$-juntas under the uniform distribution, we obtain an interactive protocol, where the verifier uses $O(2^k)$ random examples to a given function $f \colon \{0,1\}^n \to \{0,1\}$. Crucially, the sample complexity of the verifier is independent of $n$. We also show that if we do not insist on doubly-efficient proof systems, then the model becomes trivial. Specifically, we show a protocol for an arbitrary class $\mathcal{C}$ of Boolean functions in the distribution-free setting, where the verifier uses $O(1)$ labeled examples to learn $f$.

On the Power of Interactive Proofs for Learning

TL;DR

, and verify

-juntas under the uniform distribution, all while keeping the verifier’s sample complexity polylogarithmic in relevant parameters and the prover’s runtime quasi-polynomial or polynomial in the input size. A core contribution is a set of general transformations that convert tolerant testers and membership-query-based learners into PAC-verifiers that require far fewer samples than running the learner itself, aided by tools like the Nisan-Wigderson reconstruction, distance estimators, and embeding-based query-to-sample reductions. The paper also demonstrates the power of allowing unbounded provers, yielding distribution-free PAC-verification for arbitrary classes with minimal labeled data, thereby highlighting the potential of interactive proofs to substantially reduce data and compute requirements in learning verification. Overall, the results illuminate new pathways for validating machine-learning outcomes efficiently and robustly against powerful but untrusted verifiers and provers, with implications for both theory and practice in verifiable AI systems.

Abstract

We continue the study of doubly-efficient proof systems for verifying agnostic PAC learning, for which we obtain the following results. - We construct an interactive protocol for learning the

largest Fourier characters of a given function

up to an arbitrarily small error, wherein the verifier uses

random examples. This improves upon the Interactive Goldreich-Levin protocol of Goldwasser, Rothblum, Shafer, and Yehudayoff (ITCS 2021) whose sample complexity is

. - For agnostically learning the class

under the uniform distribution, we build on the work of Carmosino, Impagliazzo, Kabanets, and Kolokolova (APPROX/RANDOM 2017) and design an interactive protocol, where given a function

, the verifier learns the closest hypothesis up to

multiplicative factor, using quasi-polynomially many random examples. In contrast, this class has been notoriously resistant even for constructing realisable learners (without a prover) using random examples. - For agnostically learning

-juntas under the uniform distribution, we obtain an interactive protocol, where the verifier uses

random examples to a given function

. Crucially, the sample complexity of the verifier is independent of

. We also show that if we do not insist on doubly-efficient proof systems, then the model becomes trivial. Specifically, we show a protocol for an arbitrary class

of Boolean functions in the distribution-free setting, where the verifier uses

labeled examples to learn

Paper Structure (50 sections, 36 theorems, 43 equations, 9 algorithms)

This paper contains 50 sections, 36 theorems, 43 equations, 9 algorithms.

Introduction
Our Results
Learning Heavy Fourier Characters
Learning $\mathsf{AC}^0[2]$ Circuits
Learning Juntas
The Power of PAC-Verification with Unbounded Provers
Technical Overview
Proof Outline of Theorem \ref{['thm:our-results-topfourier']}
Discussion.
Proof Outline of \ref{['thm:our-results-ac0[2]-pac-verify']}
Related Work
Future Directions
Acknowledgements
Preliminaries
Agnostic Learners
...and 35 more sections

Key Result

Theorem 1.1

There exists an interactive proof, such that for any $f : \{0,1\}^n \rightarrow \{0,1\}$, any $\epsilon > 0$ and any $t \in \mathbb{N}$, the verifier uses at most $\poly(t/\epsilon)$ many random examples and outputs a set $\Tilde{\Lambda_t}$ that $\epsilon$-approximates $\Lambda_t$ with probability

Theorems & Definitions (89)

Theorem 1.1: Learning heavy Fourier characters (See \ref{['thm:fourier-random']} for formal statement)
Theorem 1.2: PAC-verification for $\mathsf{AC}^0$$[p]$ (See \ref{['thm:agnostic_ip_ac0']} for a formal statement)
Theorem 1.3: PAC-verification for $k$-juntas (see \ref{['thm:formal-statement-junta-random']} for a formal statement)
Theorem 1.4: PAC-verification for $\Ppoly$ (see \ref{['thm:aip-ppoly-erm']} for a formal statement)
Definition 2.1: Agnostic Learning a class $\mathcal{C}$ with hypothesis class $\mathcal{H}$
Definition 2.2: $(\alpha,\varepsilon,\delta)$-PAC-verifying $\mathcal{C}$
Definition 2.4: $(\epsilon, \delta)$-distance estimator
Definition 2.5: $(c_u, c_\ell)$-noise tolerant tester
Claim 2.6
proof
...and 79 more

On the Power of Interactive Proofs for Learning

TL;DR

Abstract

On the Power of Interactive Proofs for Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (89)