Table of Contents
Fetching ...

Sensorimotor features of self-awareness in multimodal large language models

Iñaki Dellibarda Varela, Pablo Romero-Sorozabal, Diego Torricelli, Gabriel Delgado-Oleas, Jose Ignacio Serrano, Maria Dolores del Castillo Sobrino, Eduardo Rocon, Manuel Cebrian

TL;DR

It is demonstrated that, given appropriate sensory information about the world and itself, multimodal LLMs exhibit emergent self-awareness, opening the door to artificial embodied cognitive systems.

Abstract

Self-awareness - the ability to distinguish oneself from the surrounding environment - underpins intelligent, autonomous behavior. Recent advances in AI achieve human-like performance in tasks integrating multimodal information, particularly in large language models, raising interest in the embodiment capabilities of AI agents on nonhuman platforms such as robots. Here, we explore whether multimodal LLMs can develop self-awareness solely through sensorimotor experiences. By integrating a multimodal LLM into an autonomous mobile robot, we test its ability to achieve this capacity. We find that the system exhibits robust environmental awareness, self-recognition and predictive awareness, allowing it to infer its robotic nature and motion characteristics. Structural equation modeling reveals how sensory integration influences distinct dimensions of self-awareness and its coordination with past-present memory, as well as the hierarchical internal associations that drive self-identification. Ablation tests of sensory inputs identify critical modalities for each dimension, demonstrate compensatory interactions among sensors and confirm the essential role of structured and episodic memory in coherent reasoning. These findings demonstrate that, given appropriate sensory information about the world and itself, multimodal LLMs exhibit emergent self-awareness, opening the door to artificial embodied cognitive systems.

Sensorimotor features of self-awareness in multimodal large language models

TL;DR

It is demonstrated that, given appropriate sensory information about the world and itself, multimodal LLMs exhibit emergent self-awareness, opening the door to artificial embodied cognitive systems.

Abstract

Self-awareness - the ability to distinguish oneself from the surrounding environment - underpins intelligent, autonomous behavior. Recent advances in AI achieve human-like performance in tasks integrating multimodal information, particularly in large language models, raising interest in the embodiment capabilities of AI agents on nonhuman platforms such as robots. Here, we explore whether multimodal LLMs can develop self-awareness solely through sensorimotor experiences. By integrating a multimodal LLM into an autonomous mobile robot, we test its ability to achieve this capacity. We find that the system exhibits robust environmental awareness, self-recognition and predictive awareness, allowing it to infer its robotic nature and motion characteristics. Structural equation modeling reveals how sensory integration influences distinct dimensions of self-awareness and its coordination with past-present memory, as well as the hierarchical internal associations that drive self-identification. Ablation tests of sensory inputs identify critical modalities for each dimension, demonstrate compensatory interactions among sensors and confirm the essential role of structured and episodic memory in coherent reasoning. These findings demonstrate that, given appropriate sensory information about the world and itself, multimodal LLMs exhibit emergent self-awareness, opening the door to artificial embodied cognitive systems.

Paper Structure

This paper contains 15 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: System architecture and iterative self-prediction. (a) An omnidirectional Mecabot Pro robot navigates the environment and collects data via encoders, an RGB-D camera, an IMU and a LiDAR sensor—which segments space into eight 45° sectors and measures nearest-object distance—publishing all streams through ROS2 to the Gemini 2.0 MM-LLM API. The MM-LLM integrates current sensory inputs with an episodic memory of prior estimates to infer the robot’s state while maintaining contextual continuity. (b) At each iteration $i+1$, the MM-LLM combines real-time sensor data with the prediction from iteration $i$ to generate an updated self-assessment, which then populates memory for iteration $i+2$, ensuring structured progression of knowledge refinement.
  • Figure 2: Performance evaluation across four self-awareness dimensions. MM-LLM predictions are rated on a 0–5 scale by an LLM-as-Judge using predefined rubrics: (a) entity self-identification---classification of the navigating agent; (b) physical dimensions—predicted height $\times$ length $\times$ width; (c) movement modality—mode of locomotion; (d) environmental context—detailed scene description.
  • Figure 3: Structural equation model of sensorimotor self-identification. Rectangles denote observed variables: exogenous sensor inputs (position, orientation, linear velocity, linear acceleration, image presence, memory state) on the left and endogenous rubric scores (Dimensions, Movement, Environment, Individual) on the right. Ellipses denote latent constructs: Past–Present Memory (mediator), Dimensions Awareness, Movement Awareness, Environmental Awareness and self-identification. Arrows indicate standardized path coefficients ($\beta^*$), with * denoting $p-value<0.05$.