Table of Contents
Fetching ...

LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search

Ruiyang Wang, Hao-Lun Hsu, David Hunt, Shaocheng Luo, Jiwoo Kim, Miroslav Pajic

TL;DR

The paper tackles efficient multi-robot exploration and object search in unknown indoor environments by introducing LLM-MCoX, a centralized planner that fuses a shared LiDAR-based occupancy map with representative frontiers, doorway cues, and natural-language guidance. By encoding the global map as a grayscale image and leveraging multimodal reasoning in an LLM, LLM-MCoX assigns long-horizon, semantically aware waypoint sequences to each robot, with execution feedback and plan memory to sustain coherence. Across simulations and real-world experiments, the approach outperforms greedy and DVC baselines, achieving up to 22.7% faster exploration and up to 50% better search efficiency in large, six-robot teams, and benefits further from natural-language hints in language-guided search. This work demonstrates the practical viability of semantic-aware, centralized planning for scalable multi-robot coordination in complex, partially observed environments.

Abstract

Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.

LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search

TL;DR

The paper tackles efficient multi-robot exploration and object search in unknown indoor environments by introducing LLM-MCoX, a centralized planner that fuses a shared LiDAR-based occupancy map with representative frontiers, doorway cues, and natural-language guidance. By encoding the global map as a grayscale image and leveraging multimodal reasoning in an LLM, LLM-MCoX assigns long-horizon, semantically aware waypoint sequences to each robot, with execution feedback and plan memory to sustain coherence. Across simulations and real-world experiments, the approach outperforms greedy and DVC baselines, achieving up to 22.7% faster exploration and up to 50% better search efficiency in large, six-robot teams, and benefits further from natural-language hints in language-guided search. This work demonstrates the practical viability of semantic-aware, centralized planning for scalable multi-robot coordination in complex, partially observed environments.

Abstract

Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.

Paper Structure

This paper contains 17 sections, 1 equation, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Representative Frontier and Doorway Detection: Frontier cells (light blue) are sampled to form representative frontiers (blue dots), while potential doorways (green dots) are identified based on structural gaps in frontier regions.
  • Figure 2: LLM-MCoX Planning Pipeline. At each cycle, robots share LiDAR maps to update a global shared known map, from which representative frontiers and potential doorways are extracted. These geometric features, together with robot states, execution summaries, characteristics, and the previous plan, are provided to the LLM-based planner for waypoint assignment. Optional natural-language input from human operators can supply semantic cues for targeted exploration.
  • Figure 3: Example query and response from the LLM for waypoints assignments of a multi-robot team.
  • Figure 4: Example simulation environments used for evaluation.
  • Figure 5: Exploration and search performance across small, medium, and large environments. For each method, results are collected over 10 randomized maps per environment size, and performance is summarized using quartile plots.
  • ...and 3 more figures