Language-Conditioned Offline RL for Multi-Robot Navigation
Steven Morad, Ajay Shankar, Jan Blumenkamp, Amanda Prorok
TL;DR
This paper addresses natural language–driven navigation for multi-robot teams by conditioning low-latency control policies on embeddings from pretrained LLMs and training exclusively on offline real-world data. It introduces a two-stage approach: (1) collect a single-robot dataset and (2) generate a massive combinatorial multi-agent dataset virtually, enabling offline MARL without simulators. By reframing Q-learning with an offline Expected SARSA objective and evaluating multiple variants (Mean Q, Soft Q, and CQL), the authors find that safer, data-grounded objectives yield robust generalization to unseen commands and stable real-world deployment. Real-robot experiments with up to five agents show generalization to novel instructions, low control latency, and negligible collisions, highlighting the practical potential for language-conditioned, offline-trained multi-robot systems without finetuning.
Abstract
We present a method for developing navigation policies for multi-robot teams that interpret and follow natural language instructions. We condition these policies on embeddings from pretrained Large Language Models (LLMs), and train them via offline reinforcement learning with as little as 20 minutes of randomly-collected data. Experiments on a team of five real robots show that these policies generalize well to unseen commands, indicating an understanding of the LLM latent space. Our method requires no simulators or environment models, and produces low-latency control policies that can be deployed directly to real robots without finetuning. We provide videos of our experiments at https://sites.google.com/view/llm-marl.
