Can Complexity and Uncomputability Explain Intelligence? SuperARC: A Test for Artificial Super Intelligence Based on Recursive Compression
Alberto Hernández-Espinosa, Luan Ozelim, Felipe S. Abrahão, Hector Zenil
TL;DR
This work proposes SuperARC, a human-agnostic benchmarking framework grounded in Algorithmic Information Theory to assess AI across AGI/ASI frontiers by focusing on abstraction (compression) and prediction (inference). Leveraging CTM and BDM, the framework combines neurosymbolic methods with pattern-based approaches to quantify an AI model's ability to compress and generate executable models for sequences, revealing that frontier LLMs often rely on memorisation or pattern matching and can regress across generations. The results across next-digit tasks, free-form generation, and code-generation experiments show that algorithmic reasoning remains limited in current models, while neurosymbolic baselines can achieve high compression-based understanding, highlighting the need for integrating symbolic reasoning in AI development. The authors discuss open-ended evaluation, AID, and policy implications, advocating a shift toward algorithmic benchmarks to complement traditional human-centric assessments, and outlining practical steps for adoption, governance, and future research directions.
Abstract
We introduce an increasing-complexity, open-ended, and human-agnostic metric to evaluate foundational and frontier AI models in the context of Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI) claims. Unlike other tests that rely on human-centric questions and expected answers, or on pattern-matching methods, the test here introduced is grounded on fundamental mathematical areas of randomness and optimal inference. We argue that human-agnostic metrics based on the universal principles established by Algorithmic Information Theory (AIT) formally framing the concepts of model abstraction and prediction offer a powerful metrological framework. When applied to frontiers models, the leading LLMs outperform most others in multiple tasks, but they do not always do so with their latest model versions, which often regress and appear far from any global maximum or target estimated using the principles of AIT defining a Universal Intelligence (UAI) point and trend in the benchmarking. Conversely, a hybrid neuro-symbolic approach to UAI based on the same principles is shown to outperform frontier specialised prediction models in a simplified but relevant example related to compression-based model abstraction and sequence prediction. Finally, we prove and conclude that predictive power through arbitrary formal theories is directly proportional to compression over the algorithmic space, not the statistical space, and so further AI models' progress can only be achieved in combination with symbolic approaches that LLMs developers are adopting often without acknowledgement or realisation.
