Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation

Hassan Ali; Philipp Allgeuer; Carlo Mazzola; Giulia Belgiovine; Burak Can Kaplan; Lukáš Gajdošech; Stefan Wermter

Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation

Hassan Ali, Philipp Allgeuer, Carlo Mazzola, Giulia Belgiovine, Burak Can Kaplan, Lukáš Gajdošech, Stefan Wermter

TL;DR

A proposed dual-layered architecture features two LLMs, utilizing their complementary skills of reasoning and following instructions, combined with a memory model inspired by human cognition, demonstrating the potential of integrating memory with LLMs for combining the robot’s action and perception for adaptive task execution.

Abstract

Large Language Models (LLMs) have been recently used in robot applications for grounding LLM common-sense reasoning with the robot's perception and physical abilities. In humanoid robots, memory also plays a critical role in fostering real-world embodiment and facilitating long-term interactive capabilities, especially in multi-task setups where the robot must remember previous task states, environment states, and executed actions. In this paper, we address incorporating memory processes with LLMs for generating cross-task robot actions, while the robot effectively switches between tasks. Our proposed dual-layered architecture features two LLMs, utilizing their complementary skills of reasoning and following instructions, combined with a memory model inspired by human cognition. Our results show a significant improvement in performance over a baseline of five robotic tasks, demonstrating the potential of integrating memory with LLMs for combining the robot's action and perception for adaptive task execution.

Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation

TL;DR

Abstract

Paper Structure (15 sections, 6 figures, 4 tables)

This paper contains 15 sections, 6 figures, 4 tables.

INTRODUCTION
Related Work
Methodology
Robotic Platform: NICOL
NICOL's Visual Perception and Action Parsing
Proposed Architecture
LLMs for Instructions and Reasoning
Working Memory
Declarative Memory
Experiments and Evaluation
Single and Multi-task Scenarios
Results for Standalone Tasks
Results for Consecutive Tasks
Results for Intervened Tasks
Discussion and Conclusion

Figures (6)

Figure 1: Our setup with the semi-humanoid NICOL robot.
Figure 2: A simplified overview of our system's workflow. The workflow starts when the NICOL robot receives a given task. The robot's sensory and task inputs are then fed to our proposed LLM-powered architecture.
Figure 3: Our proposed system architecture consisting of two layers: level 0 and level 1, utilizing the worker and coordinator LLM, respectively.
Figure 4: Our model evaluation scheme has three different task execution modes, each resetting the LLM chat at a different interaction point.
Figure 5: Success rate with both LLMs in the consecutive setup (with and without memory) in comparison to the baseline (standalone setup).
...and 1 more figures

Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation

TL;DR

Abstract

Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)