Table of Contents
Fetching ...

Deploying Ten Thousand Robots: Scalable Imitation Learning for Lifelong Multi-Agent Path Finding

He Jiang, Yutong Wang, Rishi Veerapaneni, Tanishq Duhan, Guillaume Sartoretti, Jiaoyang Li

TL;DR

LMAPF presents a challenging setting with continual goal reassignment and collision avoidance for thousands of agents. The authors introduce SILLM, a scalable imitation-learning framework that combines a Spatially Sensitive Communication policy, three global guidance heuristics, and a Learnable PIBT collision-resolver, trained via imitation from a scalable Windowed MAPF-LNS solver. Across six large maps with up to $10{,}000$ agents, SILLM achieves strong throughput improvements and sub-second per-step planning times on GPUs, even surpassing the 2023 League winner in some cases, with real-world mini-robot validation supporting practicality. The work demonstrates the potential of learning-based methods for large-scale LMAPF and outlines a clear path for future enhancement through reinforcement learning.

Abstract

Lifelong Multi-Agent Path Finding (LMAPF) repeatedly finds collision-free paths for multiple agents that are continually assigned new goals when they reach current ones. Recently, this field has embraced learning-based methods, which reactively generate single-step actions based on individual local observations. However, it is still challenging for them to match the performance of the best search-based algorithms, especially in large-scale settings. This work proposes an imitation-learning-based LMAPF solver that introduces a novel communication module as well as systematic single-step collision resolution and global guidance techniques. Our proposed solver, Scalable Imitation Learning for LMAPF (SILLM), inherits the fast reasoning speed of learning-based methods and the high solution quality of search-based methods with the help of modern GPUs. Across six large-scale maps with up to 10,000 agents and varying obstacle structures, SILLM surpasses the best learning- and search-based baselines, achieving average throughput improvements of 137.7% and 16.0%, respectively. Furthermore, SILLM also beats the winning solution of the 2023 League of Robot Runners, an international LMAPF competition. Finally, we validated SILLM with 10 real robots and 100 virtual robots in a mock warehouse environment.

Deploying Ten Thousand Robots: Scalable Imitation Learning for Lifelong Multi-Agent Path Finding

TL;DR

LMAPF presents a challenging setting with continual goal reassignment and collision avoidance for thousands of agents. The authors introduce SILLM, a scalable imitation-learning framework that combines a Spatially Sensitive Communication policy, three global guidance heuristics, and a Learnable PIBT collision-resolver, trained via imitation from a scalable Windowed MAPF-LNS solver. Across six large maps with up to agents, SILLM achieves strong throughput improvements and sub-second per-step planning times on GPUs, even surpassing the 2023 League winner in some cases, with real-world mini-robot validation supporting practicality. The work demonstrates the potential of learning-based methods for large-scale LMAPF and outlines a clear path for future enhancement through reinforcement learning.

Abstract

Lifelong Multi-Agent Path Finding (LMAPF) repeatedly finds collision-free paths for multiple agents that are continually assigned new goals when they reach current ones. Recently, this field has embraced learning-based methods, which reactively generate single-step actions based on individual local observations. However, it is still challenging for them to match the performance of the best search-based algorithms, especially in large-scale settings. This work proposes an imitation-learning-based LMAPF solver that introduces a novel communication module as well as systematic single-step collision resolution and global guidance techniques. Our proposed solver, Scalable Imitation Learning for LMAPF (SILLM), inherits the fast reasoning speed of learning-based methods and the high solution quality of search-based methods with the help of modern GPUs. Across six large-scale maps with up to 10,000 agents and varying obstacle structures, SILLM surpasses the best learning- and search-based baselines, achieving average throughput improvements of 137.7% and 16.0%, respectively. Furthermore, SILLM also beats the winning solution of the 2023 League of Robot Runners, an international LMAPF competition. Finally, we validated SILLM with 10 real robots and 100 virtual robots in a mock warehouse environment.

Paper Structure

This paper contains 23 sections, 3 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Comparison of mean throughput and mean planning time per step between our solver, SILLM, with other state-of-the-art search- and learning-based solvers on 6 maps with 10,000 agents. The (L) and (S) in the legend denote learning-based and search-based solvers. Details are given in \ref{['tab:main']}.
  • Figure 2: Core network structure. The global state has all static obstacles (black squares) and agents (colored circles). As an example, an agent's FoV is of size $3\times3$. The orange agent's unnormalized obstacle and heuristic feature maps are shown in the right upper and bottom corners.
  • Figure 3: Data collection procedure in the \ref{['ss: Windowed MAPF-LNS']}.
  • Figure 4: Comparison of Learnable-PIBT (L-PIBT) and PIBT with different global guidance on large instances. The radar plot shows the score for each instance. The table reports the average score and the average planning time per timestep.
  • Figure 5: Ablation studies on large maps.
  • ...and 4 more figures