Data-Efficient Multi-Agent Spatial Planning with LLMs
Huangyuan Su, Aaron Walsman, Daniel Garces, Sham Kakade, Stephanie Gil
TL;DR
This work treats multi-agent taxi routing on a graph as a data-efficient planning problem and demonstrates that pretrained LLMs can serve as effective base policies for spatial planning. By combining zero-shot prompting, grounding of graph information, and a rollout-based offline/online learning framework, the approach achieves state-of-the-art performance with far fewer environment interactions than prior methods. Finetuning via one-at-a-time rollout substantially improves planning quality and mitigates spatial hallucinations, while a range of prompting strategies reveals trade-offs between accuracy and computation. The results show strong generalization to larger maps and more agents, highlighting the potential for LLMs to accelerate data-efficient planning in dynamic, multi-agent logistics tasks. The work also outlines limitations such as inference speed and hallucination risks, and points to future directions including predictive value functions and broader benchmarks.
Abstract
In this project, our goal is to determine how to leverage the world-knowledge of pretrained large language models for efficient and robust learning in multiagent decision making. We examine this in a taxi routing and assignment problem where agents must decide how to best pick up passengers in order to minimize overall waiting time. While this problem is situated on a graphical road network, we show that with the proper prompting zero-shot performance is quite strong on this task. Furthermore, with limited fine-tuning along with the one-at-a-time rollout algorithm for look ahead, LLMs can out-compete existing approaches with 50 times fewer environmental interactions. We also explore the benefits of various linguistic prompting approaches and show that including certain easy-to-compute information in the prompt significantly improves performance. Finally, we highlight the LLM's built-in semantic understanding, showing its ability to adapt to environmental factors through simple prompts.
