Table of Contents
Fetching ...

CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations

Pengying Wu, Yao Mu, Kangjie Zhou, Ji Ma, Junting Chen, Chang Liu

TL;DR

CAMON tackles cooperative multi-object navigation in indoor environments by enabling multiple robots to communicate and coordinate via LLMs within a comm-triggered dynamic leadership framework. It fuses perception, planning, and control into a decentralized pipeline where each agent maintains a local semantic map, describes room-level scenes, and negotiates task division through LLM-driven proposals and leader coordination. Key contributions include a perception module for room-aware scene understanding, a dynamic leadership mechanism to balance information flow, and a planning workflow that minimizes communication while achieving fast consensus on actions and object targets. The approach promises robust, scalable collaboration for home-service robotics, with potential extensions to dynamic objects and cross-floor navigation.

Abstract

Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in household scenarios, specifically in the use of multiple agents collaborating to complete complex navigation tasks through communication, remains unexplored. Therefore, this paper proposes a framework for decentralized multi-agent navigation, leveraging LLM-enabled communication and collaboration. By designing the communication-triggered dynamic leadership organization structure, we achieve faster team consensus with fewer communication instances, leading to better navigation effectiveness and collaborative exploration efficiency. With the proposed novel communication scheme, our framework promises to be conflict-free and robust in multi-object navigation tasks, even when there is a surge in team size.

CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations

TL;DR

CAMON tackles cooperative multi-object navigation in indoor environments by enabling multiple robots to communicate and coordinate via LLMs within a comm-triggered dynamic leadership framework. It fuses perception, planning, and control into a decentralized pipeline where each agent maintains a local semantic map, describes room-level scenes, and negotiates task division through LLM-driven proposals and leader coordination. Key contributions include a perception module for room-aware scene understanding, a dynamic leadership mechanism to balance information flow, and a planning workflow that minimizes communication while achieving fast consensus on actions and object targets. The approach promises robust, scalable collaboration for home-service robotics, with potential extensions to dynamic objects and cross-floor navigation.

Abstract

Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in household scenarios, specifically in the use of multiple agents collaborating to complete complex navigation tasks through communication, remains unexplored. Therefore, this paper proposes a framework for decentralized multi-agent navigation, leveraging LLM-enabled communication and collaboration. By designing the communication-triggered dynamic leadership organization structure, we achieve faster team consensus with fewer communication instances, leading to better navigation effectiveness and collaborative exploration efficiency. With the proposed novel communication scheme, our framework promises to be conflict-free and robust in multi-object navigation tasks, even when there is a surge in team size.
Paper Structure (16 sections, 2 equations, 2 figures)

This paper contains 16 sections, 2 equations, 2 figures.

Figures (2)

  • Figure 1: We contribute C AMON: a framework for Cooperative Multi-Object Navigation in indoor Environments. This figure shows three agents collaborating to find some objects, and the dialog box represents the agents' conversation contents. In C AMON, the agents make decisions that do not conflict with other robots and maximize team collaboration benefits by asking their current leaders.
  • Figure 2: Components of C AMON. Our framework comprises three modules: perception, communication, and control. The perception module generates a real-time semantic map using robot RGB-D and pose inputs, from which the agent extracts topology maps, and segments and describes rooms. Agent_1 makes global decisions by querying the current leader, Agent_2, to obtain the target room. Leadership and global information are then conveyed from Agent_2 to Agent_1. Finally, the control module generates a sequence of actions for the Agent_1 to navigate from its current position to the target room.