From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

Shinas Shaji; Fabian Huppertz; Alex Mitrevski; Sebastian Houben

From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

Shinas Shaji, Fabian Huppertz, Alex Mitrevski, Sebastian Houben

TL;DR

The results demonstrate that the LLM-driven agent can complete structured tasks and exhibits emergent adaptation and memory-guided planning, but also reveal significant limitations, such as hallucinations about the task success and poor instruction following by refusing to acknowledge and complete sequential tasks.

Abstract

In order to flexibly act in an everyday environment, a robotic agent needs a variety of cognitive capabilities that enable it to reason about plans and perform execution recovery. Large language models (LLMs) have been shown to demonstrate emergent cognitive aspects, such as reasoning and language understanding; however, the ability to control embodied robotic agents requires reliably bridging high-level language to low-level functionalities for perception and control. In this paper, we investigate the extent to which an LLM can serve as a core component for planning and execution reasoning in a cognitive robot architecture. For this purpose, we propose a cognitive architecture in which an agentic LLM serves as the core component for planning and reasoning, while components for working and episodic memories support learning from experience and adaptation. An instance of the architecture is then used to control a mobile manipulator in a simulated household environment, where environment interaction is done through a set of high-level tools for perception, reasoning, navigation, grasping, and placement, all of which are made available to the LLM-based agent. We evaluate our proposed system on two household tasks (object placement and object swapping), which evaluate the agent's reasoning, planning, and memory utilisation. The results demonstrate that the LLM-driven agent can complete structured tasks and exhibits emergent adaptation and memory-guided planning, but also reveal significant limitations, such as hallucinations about the task success and poor instruction following by refusing to acknowledge and complete sequential tasks. These findings highlight both the potential and challenges of employing LLMs as embodied cognitive controllers for autonomous robots.

From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

TL;DR

Abstract

Paper Structure (20 sections, 5 figures, 2 tables)

This paper contains 20 sections, 5 figures, 2 tables.

INTRODUCTION
RELATED WORK
Cognitive architectures
LLM-based embodied AI
Robot foundation models
Agentic LLMs
METHODOLOGY
Simulated World
Tools for World Interaction
LLM-Driven Cognitive Processes in the Architecture
Episodic memory
Working memory
Perception and action execution
EVALUATION
Evaluation Tasks
...and 5 more sections

Figures (5)

Figure 1: Overview of our proposed system
Figure 2: Illustration of the simulated robot and test environment. Green dots are semantic locations that the agent can navigate to via the connected lines.
Figure 3: Confusion matrices illustrating the models' believed execution success as opposed to the actual execution success
Figure 4: Ground-truth success rate of the models over an increasing number of executions added to the episodic memory
Figure 5: Tool calls of the models over an increasing number of executions accumulated in the episodic memory

From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

TL;DR

Abstract

From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

Authors

TL;DR

Abstract

Table of Contents

Figures (5)