Neural Interactive Proofs

Lewis Hammond; Sam Adam-Day

Neural Interactive Proofs

Lewis Hammond, Sam Adam-Day

TL;DR

Neural interactive proofs study how a computationally bounded verifier can learn to interact with powerful untrusted provers to solve tasks via verifiable proofs. The approach models tasks as prover-verifier games and trains neural provers and verifiers to optimise either empirical risk $\mathcal{L}^{ER}$ or worst-case loss $\mathcal{L}^{WC}$ on labelled data. Key contributions include a unifying PVG framework, new neural IP protocols (including zero-knowledge variants), theoretical links to Stackelberg equilibria and zero-knowledge, and an open-source codebase; plus empirical demonstrations on graph isomorphism and code validation using LLMs. This work advances scalable verification and safety of AI systems by enabling learnable, provably structured interactions between weaker verifiers and stronger, learnt provers.

Abstract

We consider the problem of how a trusted, but computationally bounded agent (a 'verifier') can learn to interact with one or more powerful but untrusted agents ('provers') in order to solve a given task. More specifically, we study the case in which agents are represented using neural networks and refer to solutions of this problem as neural interactive proofs. First we introduce a unifying framework based on prover-verifier games, which generalises previously proposed interaction protocols. We then describe several new protocols for generating neural interactive proofs, and provide a theoretical comparison of both new and existing approaches. Finally, we support this theory with experiments in two domains: a toy graph isomorphism problem that illustrates the key ideas, and a code validation task using large language models. In so doing, we aim to create a foundation for future work on neural interactive proofs and their application in building safer AI systems.

Neural Interactive Proofs

TL;DR

Abstract

Neural Interactive Proofs

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (32)