Table of Contents
Fetching ...

Probing for Consciousness in Machines

Mathis Immertreu, Achim Schilling, Andreas Maier, Patrick Krauss

TL;DR

This research provides foundational insights into the capabilities of artificial agents in mirroring aspects of human consciousness, with implications for future advancements in artificial intelligence.

Abstract

This study explores the potential for artificial agents to develop core consciousness, as proposed by Antonio Damasio's theory of consciousness. According to Damasio, the emergence of core consciousness relies on the integration of a self model, informed by representations of emotions and feelings, and a world model. We hypothesize that an artificial agent, trained via reinforcement learning (RL) in a virtual environment, can develop preliminary forms of these models as a byproduct of its primary task. The agent's main objective is to learn to play a video game and explore the environment. To evaluate the emergence of world and self models, we employ probes-feedforward classifiers that use the activations of the trained agent's neural networks to predict the spatial positions of the agent itself. Our results demonstrate that the agent can form rudimentary world and self models, suggesting a pathway toward developing machine consciousness. This research provides foundational insights into the capabilities of artificial agents in mirroring aspects of human consciousness, with implications for future advancements in artificial intelligence.

Probing for Consciousness in Machines

TL;DR

This research provides foundational insights into the capabilities of artificial agents in mirroring aspects of human consciousness, with implications for future advancements in artificial intelligence.

Abstract

This study explores the potential for artificial agents to develop core consciousness, as proposed by Antonio Damasio's theory of consciousness. According to Damasio, the emergence of core consciousness relies on the integration of a self model, informed by representations of emotions and feelings, and a world model. We hypothesize that an artificial agent, trained via reinforcement learning (RL) in a virtual environment, can develop preliminary forms of these models as a byproduct of its primary task. The agent's main objective is to learn to play a video game and explore the environment. To evaluate the emergence of world and self models, we employ probes-feedforward classifiers that use the activations of the trained agent's neural networks to predict the spatial positions of the agent itself. Our results demonstrate that the agent can form rudimentary world and self models, suggesting a pathway toward developing machine consciousness. This research provides foundational insights into the capabilities of artificial agents in mirroring aspects of human consciousness, with implications for future advancements in artificial intelligence.

Paper Structure

This paper contains 9 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Simplified representation of Damasio's model of consciousness (taken from 10.3389/fncom.2020.556544): The protoself operates at an unconscious level, processing emotions and sensory input. Core consciousness emerges from the protoself, creating the initial self and world models, allowing the self to relate to its environment. Projections of emotions evolve into higher-order feelings. With access to memory and the integration of complex functions such as language processing, extended consciousness develops, further enhancing the self and world models.
  • Figure 2: Schematized overview of the approach: 1.) An agent is trained with RL 2.) A dataset of the trained agent's position and neural network's activations is sampled 3.) Using this dataset on each layer's activations a probe is trained to predict the true position. 4.) If one of the probes can predict the true agent position (with an accuracy significantly higher than chance), it shows that the necessary information is contained in the activations. Thus the agent developed a world model.
  • Figure 3: The basic agent-environment interaction cycle: The agent observes the current state/observation $s_t/o_t$, decides on an action $a_t$ based on its policy and the environment reacts to this action by returning a reward $r_t$ and the next state/observation $s_{t+1}/o_{t+1}$ beginning the next cycle.
  • Figure 4: An example of the ultimate type map: The small figure is the agent, the staircases up and down are the start and goal, respectively, the eyes show uncovered teleportation traps and the bones are remains of a defeated monster. The dark grey areas have already been visited by the agent and the light gray 3x3 crop around the agent is the area he just discovered.
  • Figure 5: A simplified illustration of the agent architecture.: The map (dashed circle) was only included as input in the first experiment and the LSTM cell (dotted rounded rectangle) became part of the architecture from the second experiment onward.