Provably safe systems: the only path to controllable AGI

Max Tegmark; Steve Omohundro

Provably safe systems: the only path to controllable AGI

Max Tegmark, Steve Omohundro

TL;DR

This paper argues that the only robust route to controllable AGI is through provably safe systems built on formal verification, mechanistic interpretability, and proof-c carrying constructs. It introduces a hierarchy—PCS, PCC, PCH, and PMC—and outlines how AI can discover algorithms while proofs guarantee safety, with verification enforced by hardware-level contracts and governance. It details current methods, potential progress via automated theorem proving, and the role of MI in translating learned knowledge into verifiable code, while highlighting the urgent need for challenge problems across verification, hardware, governance, and tamper-resistance. The work emphasizes a security-minded, infrastructure-wide approach to AI safety, with measurable, provable guarantees and automated tooling designed to prevent adversarial exploitation and existential risks.

Abstract

We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.

Provably safe systems: the only path to controllable AGI

TL;DR

Abstract

Paper Structure (30 sections, 2 figures)

This paper contains 30 sections, 2 figures.

Introduction
The path to provably safe AI
Mathematical Proof
Proof-Carrying Code
Automated Software Verification
Program Synthesis and Mechinterp
Networks of Provable Contracts
The Only Path To Controllable AGI
Challenge Problems
Automate formal verification:
Develop verification benchmarks:
Develop probabilistic program verification:
Develop quantum formal verification:
Automate mechanistic interpretability:
Develop mechanistic interpretability benchmarks:
...and 15 more sections

Figures (2)

Figure 1: The 2023 MIT Mechanistic Interpretablity Conference
Figure :

Provably safe systems: the only path to controllable AGI

TL;DR

Abstract

Provably safe systems: the only path to controllable AGI

Authors

TL;DR

Abstract

Table of Contents

Figures (2)