Table of Contents
Fetching ...

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

TL;DR

This work tackles the scalability limits of Anytime MAPF by introducing ADDRESS, a single adaptive destroy-and-repair heuristic for MAPF-LNS. ADDRESS replaces multiple stationary destroy heuristics with a restricted Thompson Sampling strategy applied to the top-$K$ delayed agents to seed adaptive LNS neighborhoods, enabling faster improvement in solution cost and AUC, even in large-scale instances with up to 1000 agents. The authors demonstrate substantial performance gains across five MAPF benchmarks, outperforming MAPF-LNS, BALANCE, and LaCAM*, while revealing that the approach is robust across parameter choices and time budgets. The proposed method offers a simple yet effective blueprint for online adaptation in combinatorial optimization and can be extended to other domains where high-cost variables govern neighborhood generation.

Abstract

Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to choose among multiple destroy heuristics. However, to determine promising destroy heuristics, MAPF-LNS requires a considerable amount of exploration time. As common destroy heuristics are non-adaptive, any performance bottleneck caused by these heuristics cannot be overcome via adaptive heuristic selection alone, thus limiting the overall effectiveness of MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning (ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies restricted Thompson Sampling to the top-K set of the most delayed agents to select a seed agent for adaptive LNS neighborhood generation. We evaluate ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost improvements by at least 50% in large-scale scenarios with up to a thousand agents, compared with the original MAPF-LNS and other state-of-the-art methods.

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

TL;DR

This work tackles the scalability limits of Anytime MAPF by introducing ADDRESS, a single adaptive destroy-and-repair heuristic for MAPF-LNS. ADDRESS replaces multiple stationary destroy heuristics with a restricted Thompson Sampling strategy applied to the top- delayed agents to seed adaptive LNS neighborhoods, enabling faster improvement in solution cost and AUC, even in large-scale instances with up to 1000 agents. The authors demonstrate substantial performance gains across five MAPF benchmarks, outperforming MAPF-LNS, BALANCE, and LaCAM*, while revealing that the approach is robust across parameter choices and time budgets. The proposed method offers a simple yet effective blueprint for online adaptation in combinatorial optimization and can be extended to other domains where high-cost variables govern neighborhood generation.

Abstract

Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to choose among multiple destroy heuristics. However, to determine promising destroy heuristics, MAPF-LNS requires a considerable amount of exploration time. As common destroy heuristics are non-adaptive, any performance bottleneck caused by these heuristics cannot be overcome via adaptive heuristic selection alone, thus limiting the overall effectiveness of MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning (ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies restricted Thompson Sampling to the top-K set of the most delayed agents to select a seed agent for adaptive LNS neighborhood generation. We evaluate ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost improvements by at least 50% in large-scale scenarios with up to a thousand agents, compared with the original MAPF-LNS and other state-of-the-art methods.
Paper Structure (34 sections, 6 figures, 1 table, 2 algorithms)

This paper contains 34 sections, 6 figures, 1 table, 2 algorithms.

Figures (6)

  • Figure 1: Scheme of our contribution. Instead of using an adaptive selection mechanism $\pi$ to choose among multiple stationary destroy heuristics $H_x$li2021anytime, ADDRESS (our approach) only uses a single adaptive heuristic.
  • Figure 2: Detailed overview of ADDRESS. For each agent $a_i \in \mathcal{A}$, we maintain two parameters $\alpha_i, \beta_i > 0$. At each LNS iteration, all agents are sorted w.r.t. to their delays. A restricted Thompson Sampling approach is applied to the top-$K$ set of the most delayed agents, according to their samples $q_i \sim \textit{Beta}(\alpha_i,\beta_i)$, to choose a seed agent index$j$. The path of the seed agent $a_j$ is used to generate an LNS neighborhood $A_N \subset \mathcal{A}$ via random walks. After running the LNS destroy-and-repair operations on $A_N$, the parameters $\alpha_j$ or $\beta_j$ of the seed agent $a_j$ are updated, depending on the cost improvement of the new solution.
  • Figure 3: Sum of delays for ADDRESS (using $\epsilon$-greedy or Thompson Sampling) compared with MAPF-LNS (using only the agent-based heuristic) for different numbers of options $K$ with $m = 700$ agents in both maps, a time budget of 60 seconds, and $\epsilon = \frac{1}{2}$.
  • Figure 4: Sum of delays and AUC for ADDRESS (using $\epsilon$-greedy or Thompson Sampling) compared with MAPF-LNS (using only the agent-based heuristic) for different time budgets (starting from 15 seconds) with $m = 700$ agents in both maps and $\epsilon = \frac{1}{2}$. Shaded areas show the 95% confidence interval.
  • Figure 5: Sum of delays (left) and AUC (middle) for ADDRESS compared with the original MAPF-LNS (with and without our ADDRESS heuristic) for different time budgets (starting from 15 seconds) with $m = 700$ agents in all maps. Shaded areas show the 95% confidence interval. Right: Evolution of the selection weights of MAPF-LNS with our ADDRESS heuristic over time.
  • ...and 1 more figures