Table of Contents
Fetching ...

Data-Efficient Multi-Agent Spatial Planning with LLMs

Huangyuan Su, Aaron Walsman, Daniel Garces, Sham Kakade, Stephanie Gil

TL;DR

This work treats multi-agent taxi routing on a graph as a data-efficient planning problem and demonstrates that pretrained LLMs can serve as effective base policies for spatial planning. By combining zero-shot prompting, grounding of graph information, and a rollout-based offline/online learning framework, the approach achieves state-of-the-art performance with far fewer environment interactions than prior methods. Finetuning via one-at-a-time rollout substantially improves planning quality and mitigates spatial hallucinations, while a range of prompting strategies reveals trade-offs between accuracy and computation. The results show strong generalization to larger maps and more agents, highlighting the potential for LLMs to accelerate data-efficient planning in dynamic, multi-agent logistics tasks. The work also outlines limitations such as inference speed and hallucination risks, and points to future directions including predictive value functions and broader benchmarks.

Abstract

In this project, our goal is to determine how to leverage the world-knowledge of pretrained large language models for efficient and robust learning in multiagent decision making. We examine this in a taxi routing and assignment problem where agents must decide how to best pick up passengers in order to minimize overall waiting time. While this problem is situated on a graphical road network, we show that with the proper prompting zero-shot performance is quite strong on this task. Furthermore, with limited fine-tuning along with the one-at-a-time rollout algorithm for look ahead, LLMs can out-compete existing approaches with 50 times fewer environmental interactions. We also explore the benefits of various linguistic prompting approaches and show that including certain easy-to-compute information in the prompt significantly improves performance. Finally, we highlight the LLM's built-in semantic understanding, showing its ability to adapt to environmental factors through simple prompts.

Data-Efficient Multi-Agent Spatial Planning with LLMs

TL;DR

This work treats multi-agent taxi routing on a graph as a data-efficient planning problem and demonstrates that pretrained LLMs can serve as effective base policies for spatial planning. By combining zero-shot prompting, grounding of graph information, and a rollout-based offline/online learning framework, the approach achieves state-of-the-art performance with far fewer environment interactions than prior methods. Finetuning via one-at-a-time rollout substantially improves planning quality and mitigates spatial hallucinations, while a range of prompting strategies reveals trade-offs between accuracy and computation. The results show strong generalization to larger maps and more agents, highlighting the potential for LLMs to accelerate data-efficient planning in dynamic, multi-agent logistics tasks. The work also outlines limitations such as inference speed and hallucination risks, and points to future directions including predictive value functions and broader benchmarks.

Abstract

In this project, our goal is to determine how to leverage the world-knowledge of pretrained large language models for efficient and robust learning in multiagent decision making. We examine this in a taxi routing and assignment problem where agents must decide how to best pick up passengers in order to minimize overall waiting time. While this problem is situated on a graphical road network, we show that with the proper prompting zero-shot performance is quite strong on this task. Furthermore, with limited fine-tuning along with the one-at-a-time rollout algorithm for look ahead, LLMs can out-compete existing approaches with 50 times fewer environmental interactions. We also explore the benefits of various linguistic prompting approaches and show that including certain easy-to-compute information in the prompt significantly improves performance. Finally, we highlight the LLM's built-in semantic understanding, showing its ability to adapt to environmental factors through simple prompts.

Paper Structure

This paper contains 32 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: In this multi-agent setting, taxis must decide where to move and pickup passengers in order to minimize overall passenger wait time. At each time step, taxis can move to any neighboring intersection, where arrows indicate one-way road segments. In this example there are two outstanding requests, one in the top left and another in the bottom right. While the green taxi is closest to the request in the bottom right, the task of allocating the red and yellow taxi is more challenging. Both are four stops away from the request in the top left, but which one is sent, and where the other moves in the meantime will determine how prepared the taxis are for picking up future requests.
  • Figure 2: We use multi-agent rollout bertsekas2021multiagent for policy improvement. In this setting, we compute updated actions for each agent by computing many rollouts for each possible action it could take, and we choose the action resulting in the lowest average cost. In order to avoid computing rollouts for all combinations in the exponential joint multi-agent action space, agent actions are computed one at a time. In the example shown here, we are estimating an improved action for the red taxi. Each of its actions are combined with the best action found for the green taxi which has already been computed, and an action chosen by the base policy of the yellow taxi, which has not yet been updated by rollout.
  • Figure 3: Here we show two policies in the same scenario, greedy on the top row and Rollout using LLM as the base policy on the bottom. Each column shows a time range, and the location of each taxi at the end of this range. Colored lines show the route taken by the taxi during this interval. In the first three frames on the far left, the LLM already starts to move the taxis into a more central location. The first request (blue) arrives in the center of the map on frame 3, and the LLM picks it up one step earlier. Eleven frames later on frame 14, two more requests (orange and purple) appear, then a third (pink) one frame later. While the LLM is slower to pick up the purple request, it is able to pick up the orange and pink requests much faster and wins by three points over all.
  • Figure 4: Cost (total waiting time) versus the number of MC futures sampled under low, medium, and high load levels.
  • Figure 5: Average cost and error bars in terms of standard deviation of methods over test set under the low load level.
  • ...and 2 more figures