Black-Box Access is Insufficient for Rigorous AI Audits

Stephen Casper; Carson Ezell; Charlotte Siegmann; Noam Kolt; Taylor Lynn Curtis; Benjamin Bucknall; Andreas Haupt; Kevin Wei; Jérémy Scheurer; Marius Hobbhahn; Lee Sharkey; Satyapriya Krishna; Marvin Von Hagen; Silas Alberti; Alan Chan; Qinyi Sun; Michael Gerovitch; David Bau; Max Tegmark; David Krueger; Dylan Hadfield-Menell

Black-Box Access is Insufficient for Rigorous AI Audits

Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

TL;DR

The paper argues that black-box AI audits are insufficient for rigorous governance and outlines a taxonomy of access—black, grey, white, and outside-the-box—to show how white-box and outside-the-box audits enable deeper scrutiny through attacks, interpretability, and data-context analyses. It presents concrete methods and safeguards for secure auditing (APIs, secure environments, legal frameworks) and discusses regulatory implications, independence, and incentive structures. The authors conclude that transparency about access and methods is essential, and higher levels of access yield substantially more scrutiny, though audits must be designed to avoid leaks and misaligned incentives. Overall, the work advocates for stronger, securely enabled auditing ecosystems to improve accountability and public trust in AI systems.

Abstract

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

Black-Box Access is Insufficient for Rigorous AI Audits

TL;DR

Abstract

Paper Structure (17 sections, 1 figure, 1 table)

This paper contains 17 sections, 1 figure, 1 table.

Introduction
Background
Black, Grey, White, and Outside-the-Box Access
Regulatory Frameworks' Reliance on Audits
Audits in the Status Quo
Limitations of Black-Box Access
Advantages of White-Box Access
White-box attack algorithms are more effective and efficient.
White-box interpretability tools aid in diagnostics.
Fine-tuning reveals risks from latent knowledge or post-deployment modifications.
Advantages of Outside-the-Box Access
Methods to Address Security Risks
Discussion
Motivations for External Audits
Technical Assistance as a form of Outside-the-Box Access
...and 2 more sections

Figures (1)

Figure 1: Black-box access lets auditors query the system and analyze the resulting outputs. Grey-box access lets auditors access limited internal information. White-box access lets users access the full system. Outside-the-box access gives auditors contextual information. In this paper, we argue that white- and outside-the-box access are key for rigorous AI audits.

Black-Box Access is Insufficient for Rigorous AI Audits

TL;DR

Abstract

Black-Box Access is Insufficient for Rigorous AI Audits

Authors

TL;DR

Abstract

Table of Contents

Figures (1)