Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models
Xiangrui Kong, Wenxiao Zhang, Jin Hong, Thomas Braunl
TL;DR
This work tackles coverage path planning for mobile robots by making LLMs act as an embodied AI brain within a three-layer framework that separates global planning, waypoint evaluation, and navigation. The method uses prompted LLMs to generate and refine waypoint sequences from grid maps, evaluates feasibility via metrics like the shortest path length $r$ and turns $\tau$, and guides real-time navigation with a safety-oriented control loop. A Coverage-Weighted Path Length (CPL) metric is introduced to quantify the trade-off between coverage completeness and path length: $CPL = \frac{1}{N} \sum_{i=1}^{N} \frac{A_i}{\bar{A_i}} \frac{l_i}{\max(p_i,l_i)}$, where $A_i$ is the coverage area, $\bar{A_i}$ the mission area, $l_i$ the shortest path and $p_i$ the planned path length. Experimental results across map sizes show Claude-3.5-Sonnet often delivering the best coverage performance and speed, with GPT-4o providing robust results and Gemini-1.5-Flash offering consistent but occasionally formatting-affected outputs, suggesting LLMs can serve as effective embodied planners for static coverage tasks when guided by well-designed prompts. This approach demonstrates the potential of LLM-driven embodied AI to enable flexible, efficient coverage navigation without task-specific training data, advancing practical deployment of autonomous robots in structured environments.
Abstract
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and solving mathematical problems, leading to advancements in various fields. We propose an LLM-embodied path planning framework for mobile agents, focusing on solving high-level coverage path planning issues and low-level control. Our proposed multi-layer architecture uses prompted LLMs in the path planning phase and integrates them with the mobile agents' low-level actuators. To evaluate the performance of various LLMs, we propose a coverage-weighted path planning metric to assess the performance of the embodied models. Our experiments show that the proposed framework improves LLMs' spatial inference abilities. We demonstrate that the proposed multi-layer framework significantly enhances the efficiency and accuracy of these tasks by leveraging the natural language understanding and generative capabilities of LLMs. Our experiments show that this framework can improve LLMs' 2D plane reasoning abilities and complete coverage path planning tasks. We also tested three LLM kernels: gpt-4o, gemini-1.5-flash, and claude-3.5-sonnet. The experimental results show that claude-3.5 can complete the coverage planning task in different scenarios, and its indicators are better than those of the other models.
