Table of Contents
Fetching ...

Minimal Self in Humanoid Robot "Alter3" Driven by Large Language Model

Takahide Yoshida, Suzune Baba, Atsushi Masumori, Takashi Ikegami

TL;DR

This work investigates whether coupling a Large Language Model with a humanoid robot (Alter3) can yield a minimal sense of self, comprising agency and ownership. By translating language into motion via a two-prompt CoT pipeline, Alter3 autonomously generates 43-axis motions and demonstrates socially relevant behaviors. The authors probe agency with a mirror test and ownership with a Rubber Hand Illusion, finding that agency emerges at a judgment level while full self-recognition and robust ownership remain elusive without proprioceptive sensing. The study highlights the potential and limitations of disembodied LLMs guiding embodied agents and suggests integrating tactile and proprioceptive feedback to approach a dynamic minimal self in robots.

Abstract

This paper introduces Alter3, a humanoid robot that demonstrates spontaneous motion generation through the integration of GPT-4, Large Language Model (LLM). This overcomes challenges in applying language models to direct robot control. By translating linguistic descriptions into actions, Alter3 can autonomously perform various tasks. The key aspect of humanoid robots is their ability to mimic human movement and emotions, allowing them to leverage human knowledge from language models. This raises the question of whether Alter3+GPT-4 can develop a "minimal self" with a sense of agency and ownership. This paper introduces mirror self-recognition and rubber hand illusion tests to assess Alter3's potential for a sense of self. The research suggests that even disembodied language models can develop agency when coupled with a physical robotic platform.

Minimal Self in Humanoid Robot "Alter3" Driven by Large Language Model

TL;DR

This work investigates whether coupling a Large Language Model with a humanoid robot (Alter3) can yield a minimal sense of self, comprising agency and ownership. By translating language into motion via a two-prompt CoT pipeline, Alter3 autonomously generates 43-axis motions and demonstrates socially relevant behaviors. The authors probe agency with a mirror test and ownership with a Rubber Hand Illusion, finding that agency emerges at a judgment level while full self-recognition and robust ownership remain elusive without proprioceptive sensing. The study highlights the potential and limitations of disembodied LLMs guiding embodied agents and suggests integrating tactile and proprioceptive feedback to approach a dynamic minimal self in robots.

Abstract

This paper introduces Alter3, a humanoid robot that demonstrates spontaneous motion generation through the integration of GPT-4, Large Language Model (LLM). This overcomes challenges in applying language models to direct robot control. By translating linguistic descriptions into actions, Alter3 can autonomously perform various tasks. The key aspect of humanoid robots is their ability to mimic human movement and emotions, allowing them to leverage human knowledge from language models. This raises the question of whether Alter3+GPT-4 can develop a "minimal self" with a sense of agency and ownership. This paper introduces mirror self-recognition and rubber hand illusion tests to assess Alter3's potential for a sense of self. The research suggests that even disembodied language models can develop agency when coupled with a physical robotic platform.
Paper Structure (15 sections, 7 figures)

This paper contains 15 sections, 7 figures.

Figures (7)

  • Figure 1: Body of Alter3. The body has 43 axes that are controlled by air actuators. The control system sends commands via a serial port to control the body. The refresh rate is 100–150 ms.
  • Figure 2: A procedure to control the Alter3 humanoid using verbal instructions. Output Python code to control Alter3 from natural language using prompt-1 via prompt-2. The architecture is based on CoT. The detail of prompt is in Appendix.
  • Figure 3: (a) Take a selfie. (b) Pretend a ghost. The LLM can generate emotional expressions associated with specific movements. For example, in the case of a selfie, Alter3 is showing a smile.
  • Figure 4: A snapshot of a generated sequence of movements: “I was enjoying a movie while eating popcorn at the theater when I realized that I was actually eating the popcorn of the person next to me”. LLM can generate movements that progress over time like a story. Left: The action of eating popcorn. Center: Noticing the person next to Alter3. Right: getting panicked.
  • Figure 5: (a) The setup of mirror self-recognition (MSR) experiment. (b) System architecture for MSR experiment. Alter3 use three different tools to determine if it have control of own body. The detail of prompt is in Appendix.
  • ...and 2 more figures