Table of Contents
Fetching ...

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

Hongzhi Zang, Yulun Zhang, He Jiang, Zhe Chen, Daniel Harabor, Peter J. Stuckey, Jiaoyang Li

TL;DR

This work tackles lifelong MAPF by learning an online guidance policy that updates a guidance graph to adapt edge costs in real time, aiming to improve throughput for PIBT-based LMAPF. It introduces two integration pipelines—the Direct Planning and Guide-Path Planning approaches—where the policy $oldsymbol{ heta}$ maps observations to edge weights on $G_g(V_g, E_g, oldsymbol{ omega})$, and optimizes $oldsymbol{ heta}$ with CMA-ES using LMAPF simulators. Empirical results across multiple maps and dynamic task distributions show that online guidance outperforms offline guidance and human-designed online policies, with throughput improvements up to 30.75% over offline and up to 52.42% over handcrafted policies in certain settings; LNS further enhances performance at modest runtime costs. The findings highlight the practical potential of dynamic guidance in large-scale robotic systems and suggest directions for broader application and efficiency improvements in online policy optimization.

Abstract

We study the problem of optimizing a guidance policy capable of dynamically guiding the agents for lifelong Multi-Agent Path Finding based on real-time traffic patterns. Multi-Agent Path Finding (MAPF) focuses on moving multiple agents from their starts to goals without collisions. Its lifelong variant, LMAPF, continuously assigns new goals to agents. In this work, we focus on improving the solution quality of PIBT, a state-of-the-art rule-based LMAPF algorithm, by optimizing a policy to generate adaptive guidance. We design two pipelines to incorporate guidance in PIBT in two different ways. We demonstrate the superiority of the optimized policy over both static guidance and human-designed policies. Additionally, we explore scenarios where task distribution changes over time, a challenging yet common situation in real-world applications that is rarely explored in the literature.

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

TL;DR

This work tackles lifelong MAPF by learning an online guidance policy that updates a guidance graph to adapt edge costs in real time, aiming to improve throughput for PIBT-based LMAPF. It introduces two integration pipelines—the Direct Planning and Guide-Path Planning approaches—where the policy maps observations to edge weights on , and optimizes with CMA-ES using LMAPF simulators. Empirical results across multiple maps and dynamic task distributions show that online guidance outperforms offline guidance and human-designed online policies, with throughput improvements up to 30.75% over offline and up to 52.42% over handcrafted policies in certain settings; LNS further enhances performance at modest runtime costs. The findings highlight the practical potential of dynamic guidance in large-scale robotic systems and suggest directions for broader application and efficiency improvements in online policy optimization.

Abstract

We study the problem of optimizing a guidance policy capable of dynamically guiding the agents for lifelong Multi-Agent Path Finding based on real-time traffic patterns. Multi-Agent Path Finding (MAPF) focuses on moving multiple agents from their starts to goals without collisions. Its lifelong variant, LMAPF, continuously assigns new goals to agents. In this work, we focus on improving the solution quality of PIBT, a state-of-the-art rule-based LMAPF algorithm, by optimizing a policy to generate adaptive guidance. We design two pipelines to incorporate guidance in PIBT in two different ways. We demonstrate the superiority of the optimized policy over both static guidance and human-designed policies. Additionally, we explore scenarios where task distribution changes over time, a challenging yet common situation in real-world applications that is rarely explored in the literature.

Paper Structure

This paper contains 24 sections, 2 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Comparison of no guidance, offline guidance zhang2024ggo, and our online guidance with a simulation of 5,000 timesteps with 600 agents in a warehouse map of size 33 $\times$ 57 with 1,091 non-obstacle cells. The average and standard deviation of throughput over 10 simulations for no guidance, offline guidance, and online guidance are $3.18 \pm 0.04$, $6.42 \pm 0.09$, and $8.66 \pm 0.04$, respectively. The heatmaps show the number of times the agents take wait action in each cell, approximating the level of congestion. Our online guidance results in the most balanced traffic and thus less congestion and higher throughput.
  • Figure 2: Overview of incorporating guidance policy with Direct Planning algorithms like PIBT.
  • Figure 3: Overview of incorporating guidance policy with Guide-Path Planning algorithms like GPIBT.
  • Figure 4: Throughput with different numbers of agents. The black vertical lines show the number of agents that are used to optimize the guidance policies. The solid line shows the average throughput over 50 LMAPF simulations, and the shaded areas denote the 95% confidence interval. "s" and "d" stand for static and dynamic task distribution, respectively.
  • Figure 5: GPIBT with LNS results. The notation of this figure is similar to that in \ref{['fig:all']}.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Definition 1: Lifelong MAPF (LMAPF)
  • Definition 2: Guidance Graph
  • Definition 3: Guidance Policy