Table of Contents
Fetching ...

Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

Han Zheng, Yining Ma, Brandon Araki, Jingkai Chen, Cathy Wu

Abstract

Lifelong Multi-Agent Path Finding (MAPF) is critical for modern warehouse automation, which requires multiple robots to continuously navigate conflict-free paths to optimize the overall system throughput. However, the complexity of warehouse environments and the long-term dynamics of lifelong MAPF often demand costly adaptations to classical search-based solvers. While machine learning methods have been explored, their superiority over search-based methods remains inconclusive. In this paper, we introduce Reinforcement Learning (RL) guided Rolling Horizon Prioritized Planning (RL-RH-PP), the first framework integrating RL with search-based planning for lifelong MAPF. Specifically, we leverage classical Prioritized Planning (PP) as a backbone for its simplicity and flexibility in integrating with a learning-based priority assignment policy. By formulating dynamic priority assignment as a Partially Observable Markov Decision Process (POMDP), RL-RH-PP exploits the sequential decision-making nature of lifelong planning while delegating complex spatial-temporal interactions among agents to reinforcement learning. An attention-based neural network autoregressively decodes priority orders on-the-fly, enabling efficient sequential single-agent planning by the PP planner. Evaluations in realistic warehouse simulations show that RL-RH-PP achieves the highest total throughput among baselines and generalizes effectively across agent densities, planning horizons, and warehouse layouts. Our interpretive analyses reveal that RL-RH-PP proactively prioritizes congested agents and strategically redirects agents from congestion, easing traffic flow and boosting throughput. These findings highlight the potential of learning-guided approaches to augment traditional heuristics in modern warehouse automation.

Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

Abstract

Lifelong Multi-Agent Path Finding (MAPF) is critical for modern warehouse automation, which requires multiple robots to continuously navigate conflict-free paths to optimize the overall system throughput. However, the complexity of warehouse environments and the long-term dynamics of lifelong MAPF often demand costly adaptations to classical search-based solvers. While machine learning methods have been explored, their superiority over search-based methods remains inconclusive. In this paper, we introduce Reinforcement Learning (RL) guided Rolling Horizon Prioritized Planning (RL-RH-PP), the first framework integrating RL with search-based planning for lifelong MAPF. Specifically, we leverage classical Prioritized Planning (PP) as a backbone for its simplicity and flexibility in integrating with a learning-based priority assignment policy. By formulating dynamic priority assignment as a Partially Observable Markov Decision Process (POMDP), RL-RH-PP exploits the sequential decision-making nature of lifelong planning while delegating complex spatial-temporal interactions among agents to reinforcement learning. An attention-based neural network autoregressively decodes priority orders on-the-fly, enabling efficient sequential single-agent planning by the PP planner. Evaluations in realistic warehouse simulations show that RL-RH-PP achieves the highest total throughput among baselines and generalizes effectively across agent densities, planning horizons, and warehouse layouts. Our interpretive analyses reveal that RL-RH-PP proactively prioritizes congested agents and strategically redirects agents from congestion, easing traffic flow and boosting throughput. These findings highlight the potential of learning-guided approaches to augment traditional heuristics in modern warehouse automation.
Paper Structure (54 sections, 16 equations, 26 figures, 8 tables, 1 algorithm)

This paper contains 54 sections, 16 equations, 26 figures, 8 tables, 1 algorithm.

Figures (26)

  • Figure 1: Robot fleet routing in a Symbotic warehouse (source: https://www.symbotic.com).
  • Figure 2: The framework of our proposed RL-RH-PP. At each planning step, the RL policy encodes shortest path of each agents into agent embeddings and autoregressively decodes a total of $K$ priority orders. These orders are passed to the RH-PP planner to compute the conflict-free path, which is then executed in the warehouse simulation. Feedback rewards are used to update the policy. This closed-loop interaction enables the RL policy to learn to generate effective priority orders during training.
  • Figure 3: Our proposed encoding process for RL-RH-PP. Raw path inputs are indexed using position embeddings to construct a learnable representation for agent paths. Temporal and spatial attention mechanisms are applied in turn, along with normalization and residual modules, to extract and refine agent embeddings.
  • Figure 4: Our proposed decoding process for RL-RH-PP. Agent embeddings from the encoder are fed into a self-attention mechanism to generate a sampling distribution. To construct each priority order, agents are sampled and added one by one in an autoregressive manner until all $N$ agents are assigned. This process can be executed in parallel to generate a set of $K$ promising orders efficiently.
  • Figure 5: Amazon fulfillment center dense map (obstacle density = 15.3$\%$), modified from li2021lifelong.
  • ...and 21 more figures

Theorems & Definitions (4)

  • Definition 1: MAPF
  • Definition 2: Lifelong MAPF
  • Definition 3: Priority Order
  • Definition 4: Total Priority Order