MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Anton Andreychuk; Konstantin Yakovlev; Aleksandr Panov; Alexey Skrynnik

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

TL;DR

MAPF-GPT addresses scalable multi-agent pathfinding by learning a decentralized policy through imitation learning from a vast expert dataset. It builds a transformer-based foundation model trained on a dataset of $10^9$ observation-action pairs, using a $67$-token MAPF-specific vocabulary and a context length of $256$. The model demonstrates zero-shot generalization to unseen maps and outperforms state-of-the-art learnable MAPF solvers across several benchmarks with favorable inference-time efficiency. The work also provides extensive ablation, Lifelong MAPF results, and a discussion of practical limitations such as the lack of theoretical guarantees and the computational cost of training.

Abstract

Multi-agent pathfinding (MAPF) is a problem that generally requires finding collision-free paths for multiple agents in a shared environment. Solving MAPF optimally, even under restrictive assumptions, is NP-hard, yet efficient solutions for this problem are critical for numerous applications, such as automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Typically, such learning-based MAPF solvers are augmented with additional components like single-agent planning or communication. Orthogonally, in this work we rely solely on imitation learning that leverages a large dataset of expert MAPF solutions and transformer-based neural network to create a foundation model for MAPF called MAPF-GPT. The latter is capable of generating actions without additional heuristics or communication. MAPF-GPT demonstrates zero-shot learning abilities when solving the MAPF problems that are not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable MAPF solvers on a diverse range of problem instances and is computationally efficient during inference.

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

TL;DR

observation-action pairs, using a

-token MAPF-specific vocabulary and a context length of

. The model demonstrates zero-shot generalization to unseen maps and outperforms state-of-the-art learnable MAPF solvers across several benchmarks with favorable inference-time efficiency. The work also provides extensive ablation, Lifelong MAPF results, and a discussion of practical limitations such as the lack of theoretical guarantees and the computational cost of training.

Abstract

Paper Structure (21 sections, 4 equations, 10 figures, 6 tables)

This paper contains 21 sections, 4 equations, 10 figures, 6 tables.

Multi-agent pathfinding
Offline reinforcement learning
Multi-agent imitation learning (MAIL)
Multi-agent pathfinding
MAPF as a sequential decision-making problem
Imitation learning
Creating MAPF Scenarios
Generating Ground Truth Data
Tokenization
Model Training
Training protocol
Main results
Ablation study
LifeLong MAPF
Runtime
...and 6 more sections

Figures (10)

Figure 1: The general pipeline of the MAPF-GPT: (1) Creating MAPF scenarios. (2) Generating ground truth data, i.e. MAPF solutions using an expert solver. (3) Transforming the solutions to the observation-action pairs and tokenization of the observations, which converts them into a format suitable for transformer architectures. (4) Executing the training loop, where observation/action pairs are sampled from the dataset, and the model is trained using cross-entropy loss.
Figure 2: The tokenization process for the MAPF-GPT model uses a vocabulary of 67 tokens, with an input of 256 tokens. Fewer tokens are shown for clarity and visibility.
Figure 3: Success rate of the evaluated MAPF solvers on different maps. The shaded area indicates $95\%$ confidence intervals.
Figure 4: Quality of the obtained solutions relative to the ones of LaCAM (lower is better).
Figure 5: Runtime of MAPF-GPT, DCC, and SCRIMP models on the Warehouse map. The plot shows the average time required to decide the next action for all agents as the number of agents increases.
...and 5 more figures

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

TL;DR

Abstract

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Authors

TL;DR

Abstract

Table of Contents

Figures (10)