Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models
Kento Murata, Shoichi Hasegawa, Tomochika Ishikawa, Yoshinobu Hagiwara, Akira Taniguchi, Lotfi El Hafi, Tadahiro Taniguchi
TL;DR
This work tackles multi-robot task planning under distributed on-site knowledge by marrying large language models with a spatial concept model that encodes room names and room-wise object presence probabilities. The proposed pipeline decomposes NL instructions, allocates subtasks to robots based on environment-grounded knowledge, plans actions sequentially, and executes with a closed-loop feedback mechanism via FlexBE. Realistic simulations and RoboCup @Home experiments show that grounding allocations in spatial concepts substantially improves assignment accuracy (47/50 vs 28/50 and 26/50 baselines) and enables handling ambiguous commands such as field-trip preparation. The approach advances scalable, robust multi-robot coordination in household-like settings and points to future enhancements including heterogeneous teams and dynamic reallocation for more complex, real-world deployments.
Abstract
It is crucial to efficiently execute instructions such as "Find an apple and a banana" or "Get ready for a field trip," which require searching for multiple objects or understanding context-dependent commands. This study addresses the challenging problem of determining which robot should be assigned to which part of a task when each robot possesses different situational on-site knowledge-specifically, spatial concepts learned from the area designated to it by the user. We propose a task planning framework that leverages large language models (LLMs) and spatial concepts to decompose natural language instructions into subtasks and allocate them to multiple robots. We designed a novel few-shot prompting strategy that enables LLMs to infer required objects from ambiguous commands and decompose them into appropriate subtasks. In our experiments, the proposed method achieved 47/50 successful assignments, outperforming random (28/50) and commonsense-based assignment (26/50). Furthermore, we conducted qualitative evaluations using two actual mobile manipulators. The results demonstrated that our framework could handle instructions, including those involving ad hoc categories such as "Get ready for a field trip," by successfully performing task decomposition, assignment, sequential planning, and execution.
