Table of Contents
Fetching ...

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

Junting Chen, Checheng Yu, Xunzhe Zhou, Tianqi Xu, Yao Mu, Mengkang Hu, Wenqi Shao, Yikai Wang, Guohao Li, Lin Shao

TL;DR

This work tackles the challenge of coordinating heterogeneous robots with differing embodiments under a large-language-model-based multi-agent framework. It introduces EMOS, an embodiment-aware MAS that performs self-generated robot resumes from URDFs and uses a three-stage, hierarchical planning-execution pipeline to coordinate HMRS in indoor environments. A key contribution is the Habitat-MAS benchmark, a simulation-based, multi-floor dataset featuring drones, wheeled, and legged robots to evaluate embodiment-aware reasoning across perception, navigation, and manipulation tasks. Experimental results, including ablations, demonstrate that robot resumes and embodiment-aware reasoning substantially improve task success while revealing trade-offs in token usage and scalability, positioning EMOS as a step toward higher automation in complex HMRS settings.

Abstract

Heterogeneous multi-robot systems (HMRS) have emerged as a powerful approach for tackling complex tasks that single robots cannot manage alone. Current large-language-model-based multi-agent systems (LLM-based MAS) have shown success in areas like software development and operating systems, but applying these systems to robot control presents unique challenges. In particular, the capabilities of each agent in a multi-robot system are inherently tied to the physical composition of the robots, rather than predefined roles. To address this issue, we introduce a novel multi-agent framework designed to enable effective collaboration among heterogeneous robots with varying embodiments and capabilities, along with a new benchmark named Habitat-MAS. One of our key designs is $\textit{Robot Resume}$: Instead of adopting human-designed role play, we propose a self-prompted approach, where agents comprehend robot URDF files and call robot kinematics tools to generate descriptions of their physics capabilities to guide their behavior in task planning and action execution. The Habitat-MAS benchmark is designed to assess how a multi-agent framework handles tasks that require embodiment-aware reasoning, which includes 1) manipulation, 2) perception, 3) navigation, and 4) comprehensive multi-floor object rearrangement. The experimental results indicate that the robot's resume and the hierarchical design of our multi-agent system are essential for the effective operation of the heterogeneous multi-robot system within this intricate problem context.

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

TL;DR

This work tackles the challenge of coordinating heterogeneous robots with differing embodiments under a large-language-model-based multi-agent framework. It introduces EMOS, an embodiment-aware MAS that performs self-generated robot resumes from URDFs and uses a three-stage, hierarchical planning-execution pipeline to coordinate HMRS in indoor environments. A key contribution is the Habitat-MAS benchmark, a simulation-based, multi-floor dataset featuring drones, wheeled, and legged robots to evaluate embodiment-aware reasoning across perception, navigation, and manipulation tasks. Experimental results, including ablations, demonstrate that robot resumes and embodiment-aware reasoning substantially improve task success while revealing trade-offs in token usage and scalability, positioning EMOS as a step toward higher automation in complex HMRS settings.

Abstract

Heterogeneous multi-robot systems (HMRS) have emerged as a powerful approach for tackling complex tasks that single robots cannot manage alone. Current large-language-model-based multi-agent systems (LLM-based MAS) have shown success in areas like software development and operating systems, but applying these systems to robot control presents unique challenges. In particular, the capabilities of each agent in a multi-robot system are inherently tied to the physical composition of the robots, rather than predefined roles. To address this issue, we introduce a novel multi-agent framework designed to enable effective collaboration among heterogeneous robots with varying embodiments and capabilities, along with a new benchmark named Habitat-MAS. One of our key designs is : Instead of adopting human-designed role play, we propose a self-prompted approach, where agents comprehend robot URDF files and call robot kinematics tools to generate descriptions of their physics capabilities to guide their behavior in task planning and action execution. The Habitat-MAS benchmark is designed to assess how a multi-agent framework handles tasks that require embodiment-aware reasoning, which includes 1) manipulation, 2) perception, 3) navigation, and 4) comprehensive multi-floor object rearrangement. The experimental results indicate that the robot's resume and the hierarchical design of our multi-agent system are essential for the effective operation of the heterogeneous multi-robot system within this intricate problem context.

Paper Structure

This paper contains 28 sections, 1 equation, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Embodiment-aware LLM-based MAS. This figure depicts how an LLM-based MAS operate a HMRS composed of dones, legged robots and wheeled robots with robotic arms, in a multi-floor house. When given a household task, the LLM-based MAS needs to undertand their respective robots' hardware specifications for task planning and assignment. The authors refer this capability as "embodiment-aware reasoning" in this work.
  • Figure 2: EMOS Framework. This figure illustrates how EMOS operates an HMRS on the Habitat-MAS platform. There are three stages: 1) Scene Context Construction involves generating scene descriptions in a bottom-up approach, relying on an ideal semantic SLAM system. 2) In Centralized Group Discussion, agents perform embodiment-aware reasoning for task planning and assignment 3) In Decentralized Action Parallel Execution, agents execute actions parallely with initial context and agent history. Precisely speaking, EMOS only includes stages 2 and 3, while stage 1 is integrated inside the Habitat-MAS platform. We include it in this diagram for completeness and clarity.
  • Figure 3: Robot Resume Generation. This figure illustrates how an LLM agent is prompted to generate a robot resume from the robot's URDF file by combining two approaches. On one hand, the LLM agent reads the skeleton of the URDF to summarize a textual description of the general capability. On the other hand, the LLM agent calls forward kinematics tool functions to generate numerical details.
  • Figure 4: Habitat-MAS Benchmark. The figure demonstrates the four tasks (columns) from two indoor scene datasets, HSSD khanna2023hssd and Matterport 3D chang2017matterport3d. The upper row demonstrates the top-down maps of the environment and the successful navigation paths of the tasks. The middle and bottom rows depict the key frames of the tasks in the third-person view when robots perceive or manipulate the target objects.
  • Figure 5: Experimental Results of EMOS and Ablated Methods by Tasks. This figure illustrates the performance of EMOS and ablated methods on the four tasks in the Habitat-MAS benchmark. The four tasks introduced in \ref{['habitat_mas: tasks']} are placed in four columns. For each task, we plot its task success rate with a blue line in the upper row, and a histogram of token usage and simulation steps in the lower row, for different ablation settings. In terms of success rate, the EMOS framework achieves a clear margin over the other ablation settings, especially the setting without robot resume. The dashed line shows the difficulty discrepancy across all four tasks. Each data point on the dashed line represents the success rate averaged over all ablation settings in this task.
  • ...and 4 more figures