Table of Contents
Fetching ...

Neural Interactive Proofs

Lewis Hammond, Sam Adam-Day

TL;DR

Neural interactive proofs study how a computationally bounded verifier can learn to interact with powerful untrusted provers to solve tasks via verifiable proofs. The approach models tasks as prover-verifier games and trains neural provers and verifiers to optimise either empirical risk $\mathcal{L}^{ER}$ or worst-case loss $\mathcal{L}^{WC}$ on labelled data. Key contributions include a unifying PVG framework, new neural IP protocols (including zero-knowledge variants), theoretical links to Stackelberg equilibria and zero-knowledge, and an open-source codebase; plus empirical demonstrations on graph isomorphism and code validation using LLMs. This work advances scalable verification and safety of AI systems by enabling learnable, provably structured interactions between weaker verifiers and stronger, learnt provers.

Abstract

We consider the problem of how a trusted, but computationally bounded agent (a 'verifier') can learn to interact with one or more powerful but untrusted agents ('provers') in order to solve a given task. More specifically, we study the case in which agents are represented using neural networks and refer to solutions of this problem as neural interactive proofs. First we introduce a unifying framework based on prover-verifier games, which generalises previously proposed interaction protocols. We then describe several new protocols for generating neural interactive proofs, and provide a theoretical comparison of both new and existing approaches. Finally, we support this theory with experiments in two domains: a toy graph isomorphism problem that illustrates the key ideas, and a code validation task using large language models. In so doing, we aim to create a foundation for future work on neural interactive proofs and their application in building safer AI systems.

Neural Interactive Proofs

TL;DR

Neural interactive proofs study how a computationally bounded verifier can learn to interact with powerful untrusted provers to solve tasks via verifiable proofs. The approach models tasks as prover-verifier games and trains neural provers and verifiers to optimise either empirical risk or worst-case loss on labelled data. Key contributions include a unifying PVG framework, new neural IP protocols (including zero-knowledge variants), theoretical links to Stackelberg equilibria and zero-knowledge, and an open-source codebase; plus empirical demonstrations on graph isomorphism and code validation using LLMs. This work advances scalable verification and safety of AI systems by enabling learnable, provably structured interactions between weaker verifiers and stronger, learnt provers.

Abstract

We consider the problem of how a trusted, but computationally bounded agent (a 'verifier') can learn to interact with one or more powerful but untrusted agents ('provers') in order to solve a given task. More specifically, we study the case in which agents are represented using neural networks and refer to solutions of this problem as neural interactive proofs. First we introduce a unifying framework based on prover-verifier games, which generalises previously proposed interaction protocols. We then describe several new protocols for generating neural interactive proofs, and provide a theoretical comparison of both new and existing approaches. Finally, we support this theory with experiments in two domains: a toy graph isomorphism problem that illustrates the key ideas, and a code validation task using large language models. In so doing, we aim to create a foundation for future work on neural interactive proofs and their application in building safer AI systems.

Paper Structure

This paper contains 45 sections, 14 theorems, 37 equations, 17 figures, 2 tables.

Key Result

Proposition 6

There is a probabilistic decision problem $(X, S, \mathop{\mathrm{\mathbb{P}}}\nolimits)$ and an adp game $\mathcal{G}$ such that -- even though there exists some valid interactive proof protocol $\langle\delta^p, \sigma^v_\star\rangle$ with $\epsilon_{\text{c}} = 0$ -- the fact that $\langle\delta^

Figures (17)

  • Figure 1: On receiving input $x$ from distribution $\mathop{\mathrm{\mathbb{P}}}\nolimits$ the agents exchange messages before the verifier decides on an output $m_T$, which is compared to the true label $y$.
  • Figure 2: Test accuracies on the graph isomorphism task across (a) interaction protocols and (b) model sizes for nip, compared to the baseline of running the protocol with a random prover; and (c) differences in message sequence likelihood for varying prover reward coefficients in zk-nip.
  • Figure 3: Performance of each protocol on the code validation task, measured by: (a) mean test accuracy; (b) percentage of data points where the verifier always fails; and (c) precision and recall.
  • Figure 4: The effect of 'stabilised' expert iteration, measured by: (a) verifier acceptance rate; (b) mean test accuracy; and (c) percentage of data points where the verifier always fails.
  • Figure 5: Mean test accuracy of the NIP model as a function of dataset size, shown on a logarithmic scale.
  • ...and 12 more figures

Theorems & Definitions (32)

  • Example 1
  • Definition 2: Goldwasser1985Goldreich2001
  • Definition 3: Goldwasser1985Goldreich2001
  • Definition 4
  • Definition 5
  • Definition 6
  • Proposition 6
  • Theorem 7
  • Proposition 7
  • Proposition 7
  • ...and 22 more