Table of Contents
Fetching ...

AI Behind Closed Doors: a Primer on The Governance of Internal Deployment

Charlotte Stix, Matteo Pistillo, Girish Sastry, Marius Hobbhahn, Alejandro Ortega, Mikita Balesni, Annika Hallensleben, Nix Goldowsky-Dill, Lee Sharkey

TL;DR

This paper argues that internal deployment of frontier AI systems—those developed and used within the deploying organization—constitutes a critical governance blind spot with potentially outsized societal risks. It develops a framework to characterize internal deployment, identifies two high-impact threat scenarios (loss of control via misaligned scheming and unchecked power concentration), and surveys existing AI governance frameworks and safety-critical-industry practices for applicable governance patterns. It then proposes a defense-in-depth blueprint comprising Frontier Safety Policies with tripwires, internal usage policies, and an oversight framework, plus targeted transparency and disaster-resilience planning, implemented via dedicated internal bodies (IDT and IDOB). The work aims to catalyze decision-making in industry and government by providing a first prototype for governance of internal deployment and highlighting opportunities for public-private cooperation to enhance safety and resilience.

Abstract

The most advanced future AI systems will first be deployed inside the frontier AI companies developing them. According to these companies and independent experts, AI systems may reach or even surpass human intelligence and capabilities by 2030. Internal deployment is, therefore, a key source of benefits and risks from frontier AI systems. Despite this, the governance of the internal deployment of highly advanced frontier AI systems appears absent. This report aims to address this absence by priming a conversation around the governance of internal deployment. It presents a conceptualization of internal deployment, learnings from other sectors, reviews of existing legal frameworks and their applicability, and illustrative examples of the type of scenarios we are most concerned about. Specifically, it discusses the risks correlated to the loss of control via the internal application of a misaligned AI system to the AI research and development pipeline, and unconstrained and undetected power concentration behind closed doors. The report culminates with a small number of targeted recommendations that provide a first blueprint for the governance of internal deployment.

AI Behind Closed Doors: a Primer on The Governance of Internal Deployment

TL;DR

This paper argues that internal deployment of frontier AI systems—those developed and used within the deploying organization—constitutes a critical governance blind spot with potentially outsized societal risks. It develops a framework to characterize internal deployment, identifies two high-impact threat scenarios (loss of control via misaligned scheming and unchecked power concentration), and surveys existing AI governance frameworks and safety-critical-industry practices for applicable governance patterns. It then proposes a defense-in-depth blueprint comprising Frontier Safety Policies with tripwires, internal usage policies, and an oversight framework, plus targeted transparency and disaster-resilience planning, implemented via dedicated internal bodies (IDT and IDOB). The work aims to catalyze decision-making in industry and government by providing a first prototype for governance of internal deployment and highlighting opportunities for public-private cooperation to enhance safety and resilience.

Abstract

The most advanced future AI systems will first be deployed inside the frontier AI companies developing them. According to these companies and independent experts, AI systems may reach or even surpass human intelligence and capabilities by 2030. Internal deployment is, therefore, a key source of benefits and risks from frontier AI systems. Despite this, the governance of the internal deployment of highly advanced frontier AI systems appears absent. This report aims to address this absence by priming a conversation around the governance of internal deployment. It presents a conceptualization of internal deployment, learnings from other sectors, reviews of existing legal frameworks and their applicability, and illustrative examples of the type of scenarios we are most concerned about. Specifically, it discusses the risks correlated to the loss of control via the internal application of a misaligned AI system to the AI research and development pipeline, and unconstrained and undetected power concentration behind closed doors. The report culminates with a small number of targeted recommendations that provide a first blueprint for the governance of internal deployment.

Paper Structure

This paper contains 43 sections, 7 figures.

Figures (7)

  • Figure A: This figure shows a representation of a self-reinforcing loop (in red). It demonstrates how internally deployed AI systems are used to help automate AI R&D, initially alongside human researchers. These AI R&D efforts culminate in a more capable AI system, which can be deployed as a new, improved, automated researcher. This cycle keeps repeating, resulting in a self-reinforcing loop.
  • Figure B: This figure represents how a self-reinforcing loop (in red) could go unchallenged and undetected in the absence of meaningful governance interventions for internal deployment.
  • Figure C: Summary of our review of internal deployment in other safety-critical industries.
  • Figure D: Swiss cheese model representing our recommended defense-in-depth strategy against the risk of loss of control via internally deployed misaligned AI (\ref{['a-loss-of-control-via-automated-ai-rd']}). Threat vectors are in red.
  • Figure E: 'Swiss cheese model' representing our recommended defense-in-depth strategy against the risk of undetected and unconstrained power accumulation (\ref{['b-undetected-and-unconstrained-power-accumulation-through-an-internal-intelligence-explosion']}). Threat vectors are in red.
  • ...and 2 more figures