Table of Contents
Fetching ...

Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework

Markus Anderljung, Everett Thornton Smith, Joe O'Brien, Lisa Soder, Benjamin Bucknall, Emma Bluemke, Jonas Schuett, Robert Trager, Lacey Strahm, Rumman Chowdhury

TL;DR

The paper argues for publicly accountable frontier LLMs through the ASPIRE external-scrutiny framework, detailing six requirements—Access, Searching attitude, Proportionality, Independence, Resources, and Expertise—to govern independent evaluation. It positions external scrutiny as essential to providing reliable information for policymakers, civil society, and users, and outlines how scrutiny can be integrated across the AI lifecycle (development, pre-deployment, post-deployment). By analyzing governance lessons and offering concrete policy recommendations, the work aims to democratize oversight while acknowledging practical limitations and the need for diverse expertise. This approach seeks to reduce information asymmetries and better align frontier LLM development with public-interest safeguards.

Abstract

With the increasing integration of frontier large language models (LLMs) into society and the economy, decisions related to their training, deployment, and use have far-reaching implications. These decisions should not be left solely in the hands of frontier LLM developers. LLM users, civil society and policymakers need trustworthy sources of information to steer such decisions for the better. Involving outside actors in the evaluation of these systems - what we term 'external scrutiny' - via red-teaming, auditing, and external researcher access, offers a solution. Though there are encouraging signs of increasing external scrutiny of frontier LLMs, its success is not assured. In this paper, we survey six requirements for effective external scrutiny of frontier AI systems and organize them under the ASPIRE framework: Access, Searching attitude, Proportionality to the risks, Independence, Resources, and Expertise. We then illustrate how external scrutiny might function throughout the AI lifecycle and offer recommendations to policymakers.

Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework

TL;DR

The paper argues for publicly accountable frontier LLMs through the ASPIRE external-scrutiny framework, detailing six requirements—Access, Searching attitude, Proportionality, Independence, Resources, and Expertise—to govern independent evaluation. It positions external scrutiny as essential to providing reliable information for policymakers, civil society, and users, and outlines how scrutiny can be integrated across the AI lifecycle (development, pre-deployment, post-deployment). By analyzing governance lessons and offering concrete policy recommendations, the work aims to democratize oversight while acknowledging practical limitations and the need for diverse expertise. This approach seeks to reduce information asymmetries and better align frontier LLM development with public-interest safeguards.

Abstract

With the increasing integration of frontier large language models (LLMs) into society and the economy, decisions related to their training, deployment, and use have far-reaching implications. These decisions should not be left solely in the hands of frontier LLM developers. LLM users, civil society and policymakers need trustworthy sources of information to steer such decisions for the better. Involving outside actors in the evaluation of these systems - what we term 'external scrutiny' - via red-teaming, auditing, and external researcher access, offers a solution. Though there are encouraging signs of increasing external scrutiny of frontier LLMs, its success is not assured. In this paper, we survey six requirements for effective external scrutiny of frontier AI systems and organize them under the ASPIRE framework: Access, Searching attitude, Proportionality to the risks, Independence, Resources, and Expertise. We then illustrate how external scrutiny might function throughout the AI lifecycle and offer recommendations to policymakers.
Paper Structure (15 sections)