Table of Contents
Fetching ...

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

Matej Šprogar

TL;DR

AGITB introduces a signal-level benchmark for artificial general intelligence that measures a model's ability to forecast the next input in temporal binary sequences without pretraining or semantic grounding. It defines fourteen interdependent requirements, including an unbiased initial state, determinism, and temporal adaptability, and applies an all-or-nothing, self-referential evaluation across 100 trials. The analysis contrasts humans, symbolic programs, artificial neural networks, and large language models, showing that current AI systems fail to satisfy all criteria, thereby highlighting a gap toward true AGI. Positioned against ARC and NeuroBench, AGITB emphasizes low-level, cortex-inspired invariants and provides an open-source reference implementation to guide progress toward general, adaptive learning, grounded in neural processing principles. The framework enables principled, interpretable assessment of generality beyond task-specific performance, with potential implications for NeuroAI and embodied cognition research.

Abstract

Current AI systems demonstrate remarkable capabilities yet remain specialised, in part because no unified measure of general intelligence has been established. Existing evaluation frameworks, which focus primarily on language or perception tasks, offer limited insight into generality. The Artificial General Intelligence Testbed (AGITB) introduces a complementary benchmarking suite of fourteen elementary tests, with thirteen implemented as fully automated procedures. AGITB evaluates models on their ability to forecast the next input in a temporal sequence, step by step, without pretraining, symbolic manipulation, or semantic grounding. The framework isolates core computational invariants, such as determinism, sensitivity, and generalisation, that parallel principles of biological information processing. Designed to resist brute-force or memorisation-based strategies, AGITB enforces unbiased and autonomous learning. The human cortex satisfies all tests, whereas no current AI system meets the full AGITB criteria, demonstrating its value as a rigorous, interpretable, and actionable benchmark for evaluating progress toward artificial general intelligence. A reference implementation of AGITB is freely available on GitHub.

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

TL;DR

AGITB introduces a signal-level benchmark for artificial general intelligence that measures a model's ability to forecast the next input in temporal binary sequences without pretraining or semantic grounding. It defines fourteen interdependent requirements, including an unbiased initial state, determinism, and temporal adaptability, and applies an all-or-nothing, self-referential evaluation across 100 trials. The analysis contrasts humans, symbolic programs, artificial neural networks, and large language models, showing that current AI systems fail to satisfy all criteria, thereby highlighting a gap toward true AGI. Positioned against ARC and NeuroBench, AGITB emphasizes low-level, cortex-inspired invariants and provides an open-source reference implementation to guide progress toward general, adaptive learning, grounded in neural processing principles. The framework enables principled, interpretable assessment of generality beyond task-specific performance, with potential implications for NeuroAI and embodied cognition research.

Abstract

Current AI systems demonstrate remarkable capabilities yet remain specialised, in part because no unified measure of general intelligence has been established. Existing evaluation frameworks, which focus primarily on language or perception tasks, offer limited insight into generality. The Artificial General Intelligence Testbed (AGITB) introduces a complementary benchmarking suite of fourteen elementary tests, with thirteen implemented as fully automated procedures. AGITB evaluates models on their ability to forecast the next input in a temporal sequence, step by step, without pretraining, symbolic manipulation, or semantic grounding. The framework isolates core computational invariants, such as determinism, sensitivity, and generalisation, that parallel principles of biological information processing. Designed to resist brute-force or memorisation-based strategies, AGITB enforces unbiased and autonomous learning. The human cortex satisfies all tests, whereas no current AI system meets the full AGITB criteria, demonstrating its value as a rigorous, interpretable, and actionable benchmark for evaluating progress toward artificial general intelligence. A reference implementation of AGITB is freely available on GitHub.

Paper Structure

This paper contains 23 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: (a) Example of a 10-bit input $x$ with four bits set. (b) Example of an input sequence.
  • Figure 2: Iterative adaptation in discrete time. At the previous time step ($t-1$), the model issued the prediction $x_t^{*}$. After observing the realised input $x_t$, it adapted its internal state in response to the error in the second bit and subsequently produced the one-step-ahead prediction $x_{t+1}^{*}$.