ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Christopher E. Mower; Yuhui Wan; Hongzhan Yu; Antoine Grosnit; Jonas Gonzalez-Billandon; Matthieu Zimmer; Jinlong Wang; Xinyu Zhang; Yao Zhao; Anbang Zhai; Puze Liu; Daniel Palenicek; Davide Tateo; Cesar Cadena; Marco Hutter; Jan Peters; Guangjian Tian; Yuzheng Zhuang; Kun Shao; Xingyue Quan; Jianye Hao; Jun Wang; Haitham Bou-Ammar

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

TL;DR

ROS-LLM presents a ROS-integrated framework that enables non-experts to program embodied robots via natural language prompts, using an expandable atomic-action library and open-source LLMs. It formalizes task execution as a modified MDP and supports multiple behavior representations (sequences, behavior trees, state machines) with imitation learning to extend capabilities and feedback to refine policies. Extensive real-world experiments on a UR5 kitchen setup, long-horizon tasks, remote supervision, and continual learning demonstrate robustness, adaptability, and scalability, with open-source code to support reproducibility. This work advances accessible, flexible, and verifiable robot programming within ROS, enabling broader adoption and collaboration across research and industry.

Abstract

We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

TL;DR

Abstract

Paper Structure (41 sections, 1 equation, 12 figures, 2 tables)

This paper contains 41 sections, 1 equation, 12 figures, 2 tables.

Introduction
Problem formulation
The ROS-LLM Framework
Atomic action library
Environment observation
Human non-expert interface
Prompt generation
Behavior representation
Python output
JSON output
XML output
Updating the atomic action library via imitation learning
Experiments
Experiments setup
Long-horizon tasks
...and 26 more sections

Figures (12)

Figure 1: Overview of a typical robotics development workflow.
Figure 2: Our proposed ROS-LLM framework overview illustrates the integration of several components. In this figure, the dashed lines denote elements that are only introduced once, such as the initial version of the atomic action library and CoT/Few-shot prompts by an expert.
Figure 3: Real-world laboratory setup used in our experiments.
Figure 4: Detailed steps in the coffee-making process arranged in a modified Z-shaped flow across four rows.
Figure 5: Detailed steps in the coffee-making process are depicted across twelve images: (a) picking up the mug, (b) placing the mug in the coffee machine, (c) opening the coffee machine cover, (d) opening the cabinet door, (e) picking up the bowl, (f) picking up the spoon, (g) scooping coffee from the bowl, (h) placing the coffee in the machine, (i) closing the coffee machine cover, (j) returning the bowl to the cabinet, (k) closing the cabinet door, (l) switching on the coffee machine.
...and 7 more figures

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

TL;DR

Abstract

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)