Table of Contents
Fetching ...

Scalable Multi-Agent Path Finding using Collision-Aware Dynamic Alert Mask and a Hybrid Execution Strategy

Bharath Muppasani, Ritirupa Dey, Biplav Srivastava, Vignesh Narayanan

TL;DR

The paper tackles scalable multi-agent path finding (MAPF) by addressing the information-sharing bottleneck that plagues fully centralized and fully distributed approaches. It introduces a hybrid framework that integrates decentralized reinforcement learning–based path planning with a lightweight centralized collision detector and a selective alert mechanism, guided by a dynamic AlertMask. The method combines a four-stage pipeline (S1–S4) with tiered information sharing (static or short-horizon dynamic constraints) to trigger localized replanning, achieving high feasibility and collision-free operation in large-scale environments while dramatically reducing inter-agent data exchange by about $93\%$. Empirical evaluation across maze and warehouse maps demonstrates strong generalization from simple training and competitive performance against state-of-the-art baselines, highlighting significant practical potential for privacy-preserving, scalable autonomous systems.

Abstract

Multi-agent pathfinding (MAPF) remains a critical problem in robotics and autonomous systems, where agents must navigate shared spaces efficiently while avoiding conflicts. Traditional centralized algorithms that have global information, such as Conflict-Based Search (CBS), provide high-quality solutions but become computationally expensive in large-scale scenarios due to the combinatorial explosion of conflicts that need resolution. Conversely, distributed approaches that have local information, particularly learning-based methods, offer better scalability by operating with relaxed information availability, yet often at the cost of solution quality. To address these limitations, we propose a hybrid framework that combines decentralized path planning with a lightweight centralized coordinator. Our framework leverages reinforcement learning (RL) for decentralized planning, enabling agents to adapt their planning based on minimal, targeted alerts--such as static conflict-cell flags or brief conflict tracks--that are dynamically shared information from the central coordinator for effective conflict resolution. We empirically study the effect of the information available to an agent on its planning performance. Our approach reduces the inter-agent information sharing compared to fully centralized and distributed methods, while still consistently finding feasible, collision-free solutions--even in large-scale scenarios having higher agent counts.

Scalable Multi-Agent Path Finding using Collision-Aware Dynamic Alert Mask and a Hybrid Execution Strategy

TL;DR

The paper tackles scalable multi-agent path finding (MAPF) by addressing the information-sharing bottleneck that plagues fully centralized and fully distributed approaches. It introduces a hybrid framework that integrates decentralized reinforcement learning–based path planning with a lightweight centralized collision detector and a selective alert mechanism, guided by a dynamic AlertMask. The method combines a four-stage pipeline (S1–S4) with tiered information sharing (static or short-horizon dynamic constraints) to trigger localized replanning, achieving high feasibility and collision-free operation in large-scale environments while dramatically reducing inter-agent data exchange by about . Empirical evaluation across maze and warehouse maps demonstrates strong generalization from simple training and competitive performance against state-of-the-art baselines, highlighting significant practical potential for privacy-preserving, scalable autonomous systems.

Abstract

Multi-agent pathfinding (MAPF) remains a critical problem in robotics and autonomous systems, where agents must navigate shared spaces efficiently while avoiding conflicts. Traditional centralized algorithms that have global information, such as Conflict-Based Search (CBS), provide high-quality solutions but become computationally expensive in large-scale scenarios due to the combinatorial explosion of conflicts that need resolution. Conversely, distributed approaches that have local information, particularly learning-based methods, offer better scalability by operating with relaxed information availability, yet often at the cost of solution quality. To address these limitations, we propose a hybrid framework that combines decentralized path planning with a lightweight centralized coordinator. Our framework leverages reinforcement learning (RL) for decentralized planning, enabling agents to adapt their planning based on minimal, targeted alerts--such as static conflict-cell flags or brief conflict tracks--that are dynamically shared information from the central coordinator for effective conflict resolution. We empirically study the effect of the information available to an agent on its planning performance. Our approach reduces the inter-agent information sharing compared to fully centralized and distributed methods, while still consistently finding feasible, collision-free solutions--even in large-scale scenarios having higher agent counts.

Paper Structure

This paper contains 37 sections, 15 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: An example MAPF problem and our four-stage planning pipeline. Left: Three agents (A1, A2, A3) navigate a grid with static obstacles (dark gray). The diagram illustrates a future vertex collision (red X), where two agents would occupy the same cell, and an edge collision (red arrows), where agents would swap adjacent cells. Top Right: The varying levels of information available to Agent A1 under centralized (all agent positions and goals), distributed (nearby agent positions and goals), and decentralized (only nearby agent positions) paradigms. Bottom Right: The four stages of our framework, from initial Path Planning (S1) to Collision Detection (S2), Resolution (S3), and Replanning (S4).
  • Figure 2: The plot illustrates the training performance of the Double Deep Q-Network (DDQN) algorithm. Episode rewards (red), sample efficiency measured by rewards per total frames (purple), training loss (blue), and episode length (orange) are presented across episodes. Smoothed curves represent moving averages, enhancing the visibility of underlying performance trends.
  • Figure 3: The plot illustrates the training performance of the Proximal Policy Optimization (PPO) algorithm, capturing episode rewards (red). Episode rewards (red), sample efficiency measured by rewards per total frames (purple), training loss (blue), and episode length (orange) are presented across episodes. Smoothed curves represent moving averages, enhancing the visibility of underlying performance trends.
  • Figure :