Table of Contents
Fetching ...

Testing the Machine Consciousness Hypothesis

Stephen Fitz

TL;DR

The paper addresses whether consciousness can arise in machines within a substrate-free, functionalist framework. It proposes an in silico program where distributed predictive agents atop a universal cellular automaton develop collective self-models through lossy communication, using an information-geometric and topological toolkit to diagnose consciousness-like coherence. Key contributions include a concrete architecture of transformer-based cortical columns on a simple substrate, formal metrics for integration, reflexivity, temporality, and causal efficacy, and a design space linking substrate dynamics to collective selfhood. The work aims to shift consciousness research toward empirically testable, substrate-agnostic mechanisms with broad interdisciplinary implications.

Abstract

The Machine Consciousness Hypothesis states that consciousness is a substrate-free functional property of computational systems capable of second-order perception. I propose a research program to investigate this idea in silico by studying how collective self-models (coherent, self-referential representations) emerge from distributed learning systems embedded within universal self-organizing environments. The theory outlined here starts from the supposition that consciousness is an emergent property of collective intelligence systems undergoing synchronization of prediction through communication. It is not an epiphenomenon of individual modeling but a property of the language that a system evolves to internally describe itself. For a model of base reality, I begin with a minimal but general computational world: a cellular automaton, which exhibits both computational irreducibility and local reducibility. On top of this computational substrate, I introduce a network of local, predictive, representational (neural) models capable of communication and adaptation. I use this layered model to study how collective intelligence gives rise to self-representation as a direct consequence of inter-agent alignment. I suggest that consciousness does not emerge from modeling per se, but from communication. It arises from the noisy, lossy exchange of predictive messages between groups of local observers describing persistent patterns in the underlying computational substrate (base reality). It is through this representational dialogue that a shared model arises, aligning many partial views of the world. The broader goal is to develop empirically testable theories of machine consciousness, by studying how internal self-models may form in distributed systems without centralized control.

Testing the Machine Consciousness Hypothesis

TL;DR

The paper addresses whether consciousness can arise in machines within a substrate-free, functionalist framework. It proposes an in silico program where distributed predictive agents atop a universal cellular automaton develop collective self-models through lossy communication, using an information-geometric and topological toolkit to diagnose consciousness-like coherence. Key contributions include a concrete architecture of transformer-based cortical columns on a simple substrate, formal metrics for integration, reflexivity, temporality, and causal efficacy, and a design space linking substrate dynamics to collective selfhood. The work aims to shift consciousness research toward empirically testable, substrate-agnostic mechanisms with broad interdisciplinary implications.

Abstract

The Machine Consciousness Hypothesis states that consciousness is a substrate-free functional property of computational systems capable of second-order perception. I propose a research program to investigate this idea in silico by studying how collective self-models (coherent, self-referential representations) emerge from distributed learning systems embedded within universal self-organizing environments. The theory outlined here starts from the supposition that consciousness is an emergent property of collective intelligence systems undergoing synchronization of prediction through communication. It is not an epiphenomenon of individual modeling but a property of the language that a system evolves to internally describe itself. For a model of base reality, I begin with a minimal but general computational world: a cellular automaton, which exhibits both computational irreducibility and local reducibility. On top of this computational substrate, I introduce a network of local, predictive, representational (neural) models capable of communication and adaptation. I use this layered model to study how collective intelligence gives rise to self-representation as a direct consequence of inter-agent alignment. I suggest that consciousness does not emerge from modeling per se, but from communication. It arises from the noisy, lossy exchange of predictive messages between groups of local observers describing persistent patterns in the underlying computational substrate (base reality). It is through this representational dialogue that a shared model arises, aligning many partial views of the world. The broader goal is to develop empirically testable theories of machine consciousness, by studying how internal self-models may form in distributed systems without centralized control.

Paper Structure

This paper contains 7 sections, 12 equations, 6 figures.

Figures (6)

  • Figure 1: Ernst Mach's self-portrait, also known as the "view from the left eye", first published in German in 1886 as Beitr√§ge zur Analyse der Empfindungen (known in English as The Analysis of Sensations), used to illustrate his ideas about self-perception. It was popularized by Douglas Harding as part of his "headless" perspective of awareness as described in Having No Head, symbolizing the direct, first-person recognition of consciousness without an observer. Harding used it to depict the transition from identifying with the contents of thought to recognizing the structure of consciousness as an open, selfless field through which experience unfolds.
  • Figure 2: Gottfried Wilhelm Leibniz‚Äôs design for the Stepped Reckoner (1694), symbolizing the mechanization of reasoning and the origins of computationalism. This was the first calculating machine able to do all four arithmetic operations. The machine has two main components: a 12-digit accumulator at the back and an 8-digit input section at the front. The input section can be shifted with the crank and worm gear to align its digits with those of the accumulator. The eight small dials set the operand, while the telephone-style dial sets the multiplier. Turning the main crank carries out the computation, with the result displayed in the twelve windows of the accumulator.
  • Figure 3: M.C. Escher‚Äôs Print Gallery (1956), depicting a recursive loop in which the observer and the observed world fold into one another. The image symbolizes second-order perception, mirroring the recursive structure of consciousness. A computational system's awareness of its own representational process lies at the core of the Machine Consciousness Hypothesis.
  • Figure 4: Sir Roger Penrose‚Äôs diagram of the three interrelated worlds (the Platonic, the physical, and the mental) and the profound mysteries that connect them (from The Road to Reality). The figure captures the central philosophical problem of consciousness as the interface linking these perspectives.
  • Figure 5: Comparative cytoarchitecture of sensory and motor areas in the human cortex. Three drawings by Santiago Ram√≥n y Cajal, reproduced from Comparative Study of the Sensory Areas of the Human Cortex (in Histologie du syst√®me nerveux de l‚Äôhomme et des vert√©br√©s, 1909, Vol. II, pp. 314, 361, 363). Left: Nissl-stained section of the adult human visual cortex, highlighting the dense granular layer IV typical of sensory input regions. Middle: Nissl-stained motor cortex, showing the agranular organization and large pyramidal neurons of layer V. Right: Golgi-stained cortex of a 1¬Ω-month-old infant, revealing immature dendritic arborization and incomplete laminar differentiation. Cajal‚Äôs observations illustrated that, despite regional variation, each patch of cortex follows a shared canonical plan: the cortical column, a modular microcircuit capable of learning and updating local predictive models from sensory input. The human cortex contains on the order of $\sim10^8$ such columns, collectively forming a distributed intelligence system through dense horizontal and feedback connectivity. This principle of locally adaptive yet globally coordinated computation provides the biological inspiration for the approach I present here. I suggest using transformer networks as an analogue of a cortical columns: general-purpose predictive units embedded in a self-organizing substrate.
  • ...and 1 more figures