Provably safe systems: the only path to controllable AGI
Max Tegmark, Steve Omohundro
TL;DR
This paper argues that the only robust route to controllable AGI is through provably safe systems built on formal verification, mechanistic interpretability, and proof-c carrying constructs. It introduces a hierarchy—PCS, PCC, PCH, and PMC—and outlines how AI can discover algorithms while proofs guarantee safety, with verification enforced by hardware-level contracts and governance. It details current methods, potential progress via automated theorem proving, and the role of MI in translating learned knowledge into verifiable code, while highlighting the urgent need for challenge problems across verification, hardware, governance, and tamper-resistance. The work emphasizes a security-minded, infrastructure-wide approach to AI safety, with measurable, provable guarantees and automated tooling designed to prevent adversarial exploitation and existential risks.
Abstract
We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.
