International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
Rebecca Scholefield, Samuel Martin, Otto Barten
TL;DR
The paper addresses the existential risk concerns around advanced GPAI and surveys 2023+ proposals for international safety governance. It analyzes risk thresholds, regulatory mechanisms, and types of international agreements, highlighting compute-based thresholds as a widely supported initial filter and describing support for audits, security, governance, and verification from high-risk industries. It presents a concrete recommendation: a conditional AI safety treaty centered on a compute threshold, with international AISIs conducting model evaluations and security/governance audits, plus reporting from cloud providers and supply-chain verification, supported by incident reporting and strategic incentives. The work underscores the need for pragmatic, adaptable governance that can evolve with rapid AI advances while leveraging existing international norms and high-risk-industry practices to mitigate systematic risks that cross borders and jurisdictions.
Abstract
The malicious use or malfunction of advanced general-purpose AI (GPAI) poses risks that, according to leading experts, could lead to the 'marginalisation or extinction of humanity.' To address these risks, there are an increasing number of proposals for international agreements on AI safety. In this paper, we review recent (2023-) proposals, identifying areas of consensus and disagreement, and drawing on related literature to assess their feasibility. We focus our discussion on risk thresholds, regulations, types of international agreement and five related processes: building scientific consensus, standardisation, auditing, verification and incentivisation. Based on this review, we propose a treaty establishing a compute threshold above which development requires rigorous oversight. This treaty would mandate complementary audits of models, information security and governance practices, overseen by an international network of AI Safety Institutes (AISIs) with authority to pause development if risks are unacceptable. Our approach combines immediately implementable measures with a flexible structure that can adapt to ongoing research.
