An Operational Kardashev-Style Scale for Autonomous AI - Towards AGI and Superintelligence
Przemyslaw Chojecki
TL;DR
The paper proposes a Kardashev-style, operational Autonomous AI (AAI) Scale to quantify progress from fixed automation to AGI and beyond using ten normalized axes and a composite AAI-Index. It introduces a Self-Improvement Coefficient $\kappa$ and two closure properties (maintenance and expansion) to render self-improvement falsifiable, alongside OWA-Bench, an open-world agency benchmark for long-horizon, tool-using, persistent agents. A formal progression framework links AAI-3 to AAI-5 through curvature and link-step progression on a fixed battery, with a CHC-based alignment crosswalk and a non-compensatory battery ensuring breadth and reliability. The approach emphasizes auditable Gates, drift-robust benchmarking, and a dynamics-aware Delegability Frontier to guide governance and deployment decisions, culminating in a Domain Annex for software agents and an Embodiment Annex for robotics. Collectively, the work provides a rigorous, auditable, and scalable framework to evaluate autonomous AI deployment, self-improvement, and progression toward superintelligence while mitigating gaming and enabling reproducibility across domains.
Abstract
We propose a Kardashev-inspired yet operational Autonomous AI (AAI) Scale that measures the progression from fixed robotic process automation (AAI-0) to full artificial general intelligence (AAI-4) and beyond. Unlike narrative ladders, our scale is multi-axis and testable. We define ten capability axes (Autonomy, Generality, Planning, Memory/Persistence, Tool Economy, Self-Revision, Sociality/Coordination, Embodiment, World-Model Fidelity, Economic Throughput) aggregated by a composite AAI-Index (a weighted geometric mean). We introduce a measurable Self-Improvement Coefficient $κ$ (capability growth per unit of agent-initiated resources) and two closure properties (maintenance and expansion) that convert ``self-improving AI'' into falsifiable criteria. We specify OWA-Bench, an open-world agency benchmark suite that evaluates long-horizon, tool-using, persistent agents. We define level gates for AAI-0\ldots AAI-4 using thresholds on the axes, $κ$, and closure proofs. Synthetic experiments illustrate how present-day systems map onto the scale and how the delegability frontier (quality vs.\ autonomy) advances with self-improvement. We also prove a theorem that AAI-3 agent becomes AAI-5 over time with sufficient conditions, formalizing "baby AGI" becomes Superintelligence intuition.
