Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

Zeyuan Zhao; Chaoran Li; Shao Zhang; Ying Wen

Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

Zeyuan Zhao, Chaoran Li, Shao Zhang, Ying Wen

TL;DR

The paper tackles the challenge of Multi-Agent Pickup and Delivery (MAPD) in warehouse-style environments by reframing MAPF as a sequence modeling problem and proving order-invariant optimality for autoregressive pathfinding policies. It introduces SePar, a Transformer-based Sequential Pathfinder that enables implicit inter-agent information exchange and reduces decision complexity from exponential to linear. SePar combines PPO-based reinforcement learning with imitation learning and employs an Observation Feature Extractor and a Multi-Agent Transformer to generate joint actions. Empirical results on both a warehouse simulator and the POGEMA MAPF benchmarks show that SePar consistently outperforms most learning-based baselines, generalizes to unseen maps, and highlights imitation learning as essential for highly structured maps. The work advances scalable, globally informed MAPF/MAPD planning in realistic settings with substantial practical impact for warehouse robotics and multi-robot coordination.

Abstract

Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.

Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

TL;DR

Abstract

Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (1)