Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

Thomy Phan; Benran Zhang; Shao-Hung Chan; Sven Koenig

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

TL;DR

This work tackles the scalability limits of Anytime MAPF by introducing ADDRESS, a single adaptive destroy-and-repair heuristic for MAPF-LNS. ADDRESS replaces multiple stationary destroy heuristics with a restricted Thompson Sampling strategy applied to the top-$K$ delayed agents to seed adaptive LNS neighborhoods, enabling faster improvement in solution cost and AUC, even in large-scale instances with up to 1000 agents. The authors demonstrate substantial performance gains across five MAPF benchmarks, outperforming MAPF-LNS, BALANCE, and LaCAM*, while revealing that the approach is robust across parameter choices and time budgets. The proposed method offers a simple yet effective blueprint for online adaptation in combinatorial optimization and can be extended to other domains where high-cost variables govern neighborhood generation.

Abstract

Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to choose among multiple destroy heuristics. However, to determine promising destroy heuristics, MAPF-LNS requires a considerable amount of exploration time. As common destroy heuristics are non-adaptive, any performance bottleneck caused by these heuristics cannot be overcome via adaptive heuristic selection alone, thus limiting the overall effectiveness of MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning (ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies restricted Thompson Sampling to the top-K set of the most delayed agents to select a seed agent for adaptive LNS neighborhood generation. We evaluate ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost improvements by at least 50% in large-scale scenarios with up to a thousand agents, compared with the original MAPF-LNS and other state-of-the-art methods.

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

TL;DR

delayed agents to seed adaptive LNS neighborhoods, enabling faster improvement in solution cost and AUC, even in large-scale instances with up to 1000 agents. The authors demonstrate substantial performance gains across five MAPF benchmarks, outperforming MAPF-LNS, BALANCE, and LaCAM*, while revealing that the approach is robust across parameter choices and time budgets. The proposed method offers a simple yet effective blueprint for online adaptation in combinatorial optimization and can be extended to other domains where high-cost variables govern neighborhood generation.

Abstract

Paper Structure (34 sections, 6 figures, 1 table, 2 algorithms)

This paper contains 34 sections, 6 figures, 1 table, 2 algorithms.

Introduction
Background
Multi-Agent Path Finding (MAPF)
Anytime MAPF with LNS
Multi-Armed Bandits
Related Work
Multi-Armed Bandits for LNS
Machine Learning in Anytime MAPF
Adaptive Delay-Based MAPF-LNS
Original Agent-Based Destroy Heuristic
ADDRESS Destroy Heuristic
ADDRESS Formulation
Conceptual Discussion
Experiments
Maps
...and 19 more sections

Figures (6)

Figure 1: Scheme of our contribution. Instead of using an adaptive selection mechanism $\pi$ to choose among multiple stationary destroy heuristics $H_x$li2021anytime, ADDRESS (our approach) only uses a single adaptive heuristic.
Figure 2: Detailed overview of ADDRESS. For each agent $a_i \in \mathcal{A}$, we maintain two parameters $\alpha_i, \beta_i > 0$. At each LNS iteration, all agents are sorted w.r.t. to their delays. A restricted Thompson Sampling approach is applied to the top-$K$ set of the most delayed agents, according to their samples $q_i \sim \textit{Beta}(\alpha_i,\beta_i)$, to choose a seed agent index$j$. The path of the seed agent $a_j$ is used to generate an LNS neighborhood $A_N \subset \mathcal{A}$ via random walks. After running the LNS destroy-and-repair operations on $A_N$, the parameters $\alpha_j$ or $\beta_j$ of the seed agent $a_j$ are updated, depending on the cost improvement of the new solution.
Figure 3: Sum of delays for ADDRESS (using $\epsilon$-greedy or Thompson Sampling) compared with MAPF-LNS (using only the agent-based heuristic) for different numbers of options $K$ with $m = 700$ agents in both maps, a time budget of 60 seconds, and $\epsilon = \frac{1}{2}$.
Figure 4: Sum of delays and AUC for ADDRESS (using $\epsilon$-greedy or Thompson Sampling) compared with MAPF-LNS (using only the agent-based heuristic) for different time budgets (starting from 15 seconds) with $m = 700$ agents in both maps and $\epsilon = \frac{1}{2}$. Shaded areas show the 95% confidence interval.
Figure 5: Sum of delays (left) and AUC (middle) for ADDRESS compared with the original MAPF-LNS (with and without our ADDRESS heuristic) for different time budgets (starting from 15 seconds) with $m = 700$ agents in all maps. Shaded areas show the 95% confidence interval. Right: Evolution of the selection weights of MAPF-LNS with our ADDRESS heuristic over time.
...and 1 more figures

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

TL;DR

Abstract

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

Authors

TL;DR

Abstract

Table of Contents

Figures (6)