Table of Contents
Fetching ...

Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review

Di Wu, Xian Wei, Guang Chen, Hao Shen, Xiangfeng Wang, Wenhao Li, Bo Jin

TL;DR

This survey addresses the challenge of enabling scalable, robust collaboration in embodied AI by integrating generative foundation models into multi-agent systems. To tackle this, it provides a systematic review with a taxonomy of collaborative architectures (extrinsic intrinsic hybrid) and analyzes core modules perception planning communication and feedback through the lens of FM capabilities. Key contributions include mapping architectural options to module design, surveying simulation platforms and real-world applications, and outlining open challenges such as Sim2Real transfer human-centricity and interpretability. This work informs researchers and practitioners about how to design and deploy FM-guided EMAS with potential impacts in logistics robotics and service automation.

Abstract

Embodied multi-agent systems (EMAS) have attracted growing attention for their potential to address complex, real-world challenges in areas such as logistics and robotics. Recent advances in foundation models pave the way for generative agents capable of richer communication and adaptive problem-solving. This survey provides a systematic examination of how EMAS can benefit from these generative capabilities. We propose a taxonomy that categorizes EMAS by system architectures and embodiment modalities, emphasizing how collaboration spans both physical and virtual contexts. Central building blocks, perception, planning, communication, and feedback, are then analyzed to illustrate how generative techniques bolster system robustness and flexibility. Through concrete examples, we demonstrate the transformative effects of integrating foundation models into embodied, multi-agent frameworks. Finally, we discuss challenges and future directions, underlining the significant promise of EMAS to reshape the landscape of AI-driven collaboration.

Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review

TL;DR

This survey addresses the challenge of enabling scalable, robust collaboration in embodied AI by integrating generative foundation models into multi-agent systems. To tackle this, it provides a systematic review with a taxonomy of collaborative architectures (extrinsic intrinsic hybrid) and analyzes core modules perception planning communication and feedback through the lens of FM capabilities. Key contributions include mapping architectural options to module design, surveying simulation platforms and real-world applications, and outlining open challenges such as Sim2Real transfer human-centricity and interpretability. This work informs researchers and practitioners about how to design and deploy FM-guided EMAS with potential impacts in logistics robotics and service automation.

Abstract

Embodied multi-agent systems (EMAS) have attracted growing attention for their potential to address complex, real-world challenges in areas such as logistics and robotics. Recent advances in foundation models pave the way for generative agents capable of richer communication and adaptive problem-solving. This survey provides a systematic examination of how EMAS can benefit from these generative capabilities. We propose a taxonomy that categorizes EMAS by system architectures and embodiment modalities, emphasizing how collaboration spans both physical and virtual contexts. Central building blocks, perception, planning, communication, and feedback, are then analyzed to illustrate how generative techniques bolster system robustness and flexibility. Through concrete examples, we demonstrate the transformative effects of integrating foundation models into embodied, multi-agent frameworks. Finally, we discuss challenges and future directions, underlining the significant promise of EMAS to reshape the landscape of AI-driven collaboration.

Paper Structure

This paper contains 37 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: A unified multi-agent framework for generative embodied AI.
  • Figure 2: The embodied multi-agent collaborative architecture.