Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems
Ben Bucknall, Robert F. Trager, Michael A. Osborne
TL;DR
This paper addresses the problem of mutual privacy in external evaluation of proprietary AI systems, formalizing it with sets $I_M$, $I_E$, $P_M$, $P_E$, $R_M$, and $R_E$, and clarifying that mutual privacy requires both $P_M \cap R_E \neq \emptyset$ and $P_E \cap R_M \neq \emptyset$. It surveys the information that model owners and evaluators may need to keep private or share, highlighting conflicts over weights, architecture, datasets, versions, logs, and evaluation methodologies. The authors critique current evaluation practices for neglecting evaluator privacy and outline potential solutions, including trusted intermediaries, cryptographic software (e.g., zero-knowledge proofs), and hardware-based enclaves to enable privacy-preserving assessment. The goal is to improve the reliability, independence, and security of external evaluations of proprietary AI systems in practice, with implications for governance and safety. $ $
Abstract
The external evaluation of AI systems is increasingly recognised as a crucial approach for understanding their potential risks. However, facilitating external evaluation in practice faces significant challenges in balancing evaluators' need for system access with AI developers' privacy and security concerns. Additionally, evaluators have reason to protect their own privacy - for example, in order to maintain the integrity of held-out test sets. We refer to the challenge of ensuring both developers' and evaluators' privacy as one of providing mutual privacy. In this position paper, we argue that (i) addressing this mutual privacy challenge is essential for effective external evaluation of AI systems, and (ii) current methods for facilitating external evaluation inadequately address this challenge, particularly when it comes to preserving evaluators' privacy. In making these arguments, we formalise the mutual privacy problem; examine the privacy and access requirements of both model owners and evaluators; and explore potential solutions to this challenge, including through the application of cryptographic and hardware-based approaches.
