Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI
Chao Yang, Chaochao Lu, Yingchun Wang, Bowen Zhou
TL;DR
The paper tackles the mismatch between rapid AI capability growth and safety oversight by proposing a balanced roadmap to trustworthy AGI. It introduces the AI-$45^{\circ}$ Law, advocating parallel improvements in capability and safety along a $45^{\circ}$ line, with Red Lines marking existential risks and a Yellow Line signaling proactive mitigation thresholds. It then presents the Causal Ladder of Trustworthy AGI—three layers (Approximate Alignment, Intervenable, Reflectable) and Endogenous/Exogenous trustworthiness—along with a Matrix of Trustworthy AGI that defines five levels of trustworthiness: Perception, Reasoning, Decision-making, Autonomy, and Collaboration. The approach synthesizes current foundation-model practices (e.g., RLHF, mechanistic interpretability, world models, counterfactual reasoning) with governance mechanisms to enable safe scaling of AGI. Overall, the framework provides a concrete, multi-layered path toward developing highly capable but ethically aligned AI systems that can operate safely in real-world settings.
Abstract
Ensuring Artificial General Intelligence (AGI) reliably avoids harmful behaviors is a critical challenge, especially for systems with high autonomy or in safety-critical domains. Despite various safety assurance proposals and extreme risk warnings, comprehensive guidelines balancing AI safety and capability remain lacking. In this position paper, we propose the \textit{AI-\textbf{$45^{\circ}$} Law} as a guiding principle for a balanced roadmap toward trustworthy AGI, and introduce the \textit{Causal Ladder of Trustworthy AGI} as a practical framework. This framework provides a systematic taxonomy and hierarchical structure for current AI capability and safety research, inspired by Judea Pearl's ``Ladder of Causation''. The Causal Ladder comprises three core layers: the Approximate Alignment Layer, the Intervenable Layer, and the Reflectable Layer. These layers address the key challenges of safety and trustworthiness in AGI and contemporary AI systems. Building upon this framework, we define five levels of trustworthy AGI: perception, reasoning, decision-making, autonomy, and collaboration trustworthiness. These levels represent distinct yet progressive aspects of trustworthy AGI. Finally, we present a series of potential governance measures to support the development of trustworthy AGI.
