Conversational Language Models for Human-in-the-Loop Multi-Robot Coordination
William Hunt, Toby Godfrey, Mohammad D. Soorati
TL;DR
The paper addresses the need for flexible, on-the-fly planning in heterogeneous multi-robot teams. It introduces a decentralized, dialog-based approach in which LLM-powered agents with different capabilities negotiate and plan via peer-to-peer and human-in-the-loop communication. A proof-of-concept demonstrates two TurtleBot3 robots equipped with LIDAR and cameras, coordinated through text-based agent dialogues to perform garbage collection under human supervision, with the human able to interrupt or request replanning. The contribution lies in showing that argument-style dialogues can effectively harness diverse agent capabilities and that text-based, transparent reasoning supports reliable human-robot interaction. This has practical implications for scalable, explainable multi-robot deployments across domains.
Abstract
With the increasing prevalence and diversity of robots interacting in the real world, there is need for flexible, on-the-fly planning and cooperation. Large Language Models are starting to be explored in a multimodal setup for communication, coordination, and planning in robotics. Existing approaches generally use a single agent building a plan, or have multiple homogeneous agents coordinating for a simple task. We present a decentralised, dialogical approach in which a team of agents with different abilities plans solutions through peer-to-peer and human-robot discussion. We suggest that argument-style dialogues are an effective way to facilitate adaptive use of each agent's abilities within a cooperative team. Two robots discuss how to solve a cleaning problem set by a human, define roles, and agree on paths they each take. Each step can be interrupted by a human advisor and agents check their plans with the human. Agents then execute this plan in the real world, collecting rubbish from people in each room. Our implementation uses text at every step, maintaining transparency and effective human-multi-robot interaction.
