Table of Contents
Fetching ...

Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI

Chao Yang, Chaochao Lu, Yingchun Wang, Bowen Zhou

TL;DR

The paper tackles the mismatch between rapid AI capability growth and safety oversight by proposing a balanced roadmap to trustworthy AGI. It introduces the AI-$45^{\circ}$ Law, advocating parallel improvements in capability and safety along a $45^{\circ}$ line, with Red Lines marking existential risks and a Yellow Line signaling proactive mitigation thresholds. It then presents the Causal Ladder of Trustworthy AGI—three layers (Approximate Alignment, Intervenable, Reflectable) and Endogenous/Exogenous trustworthiness—along with a Matrix of Trustworthy AGI that defines five levels of trustworthiness: Perception, Reasoning, Decision-making, Autonomy, and Collaboration. The approach synthesizes current foundation-model practices (e.g., RLHF, mechanistic interpretability, world models, counterfactual reasoning) with governance mechanisms to enable safe scaling of AGI. Overall, the framework provides a concrete, multi-layered path toward developing highly capable but ethically aligned AI systems that can operate safely in real-world settings.

Abstract

Ensuring Artificial General Intelligence (AGI) reliably avoids harmful behaviors is a critical challenge, especially for systems with high autonomy or in safety-critical domains. Despite various safety assurance proposals and extreme risk warnings, comprehensive guidelines balancing AI safety and capability remain lacking. In this position paper, we propose the \textit{AI-\textbf{$45^{\circ}$} Law} as a guiding principle for a balanced roadmap toward trustworthy AGI, and introduce the \textit{Causal Ladder of Trustworthy AGI} as a practical framework. This framework provides a systematic taxonomy and hierarchical structure for current AI capability and safety research, inspired by Judea Pearl's ``Ladder of Causation''. The Causal Ladder comprises three core layers: the Approximate Alignment Layer, the Intervenable Layer, and the Reflectable Layer. These layers address the key challenges of safety and trustworthiness in AGI and contemporary AI systems. Building upon this framework, we define five levels of trustworthy AGI: perception, reasoning, decision-making, autonomy, and collaboration trustworthiness. These levels represent distinct yet progressive aspects of trustworthy AGI. Finally, we present a series of potential governance measures to support the development of trustworthy AGI.

Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI

TL;DR

The paper tackles the mismatch between rapid AI capability growth and safety oversight by proposing a balanced roadmap to trustworthy AGI. It introduces the AI- Law, advocating parallel improvements in capability and safety along a line, with Red Lines marking existential risks and a Yellow Line signaling proactive mitigation thresholds. It then presents the Causal Ladder of Trustworthy AGI—three layers (Approximate Alignment, Intervenable, Reflectable) and Endogenous/Exogenous trustworthiness—along with a Matrix of Trustworthy AGI that defines five levels of trustworthiness: Perception, Reasoning, Decision-making, Autonomy, and Collaboration. The approach synthesizes current foundation-model practices (e.g., RLHF, mechanistic interpretability, world models, counterfactual reasoning) with governance mechanisms to enable safe scaling of AGI. Overall, the framework provides a concrete, multi-layered path toward developing highly capable but ethically aligned AI systems that can operate safely in real-world settings.

Abstract

Ensuring Artificial General Intelligence (AGI) reliably avoids harmful behaviors is a critical challenge, especially for systems with high autonomy or in safety-critical domains. Despite various safety assurance proposals and extreme risk warnings, comprehensive guidelines balancing AI safety and capability remain lacking. In this position paper, we propose the \textit{AI-\textbf{} Law} as a guiding principle for a balanced roadmap toward trustworthy AGI, and introduce the \textit{Causal Ladder of Trustworthy AGI} as a practical framework. This framework provides a systematic taxonomy and hierarchical structure for current AI capability and safety research, inspired by Judea Pearl's ``Ladder of Causation''. The Causal Ladder comprises three core layers: the Approximate Alignment Layer, the Intervenable Layer, and the Reflectable Layer. These layers address the key challenges of safety and trustworthiness in AGI and contemporary AI systems. Building upon this framework, we define five levels of trustworthy AGI: perception, reasoning, decision-making, autonomy, and collaboration trustworthiness. These levels represent distinct yet progressive aspects of trustworthy AGI. Finally, we present a series of potential governance measures to support the development of trustworthy AGI.

Paper Structure

This paper contains 21 sections, 3 figures.

Figures (3)

  • Figure 1: Left: Illustration of the AI-$45^{\circ}$ Law, which assumes that AI capability and safety should ideally be synchronized, represented by a $45^{\circ}$ line. Under the development of crippled AI, we can further divide the areas of existential risks (Red line) and early warning indicators (Yellow line). Right: Key milestone models in AI capability development.
  • Figure 2: Illustration of the Causal Ladder of Trustworthy AGI: The framework consists of three core layers: Approximate Alignment, Intervenable, and Reflectable. It integrates Endogenous Trustworthiness and Exogenous Trustworthiness to provide a comprehensive approach to ensuring AGI safety and trustworthiness.
  • Figure 3: Matrix of trustworthy AGI: Based on the causal ladder of AGI and the levels of trustworthiness, we illustrate the positions of several representative models within the matrix. Reliance on the Reflectable Layer increases as we progress through the levels of trustworthiness.