Deadlock-Free Hybrid RL-MAPF Framework for Zero-Shot Multi-Robot Navigation
Haoyi Wang, Licheng Luo, Yiannis Kantaros, Bruno Sinopoli, Mingyu Cai
TL;DR
The paper tackles deadlocks in multi-robot navigation by combining decentralized RL-based navigation with on-demand, locally confined MAPF to resolve topological bottlenecks. A deadlock detector triggers short, cropped MAPF subproblems for only the implicated agents, solved via Push-and-Rotate and executed as dense waypoint sequences while all other agents continue with RL. This on-demand coordination preserves decentralized efficiency and scales to unseen environments, with formal guarantees and polynomial-time complexity for the local MAPF solver. Empirical results in doorway and corridor scenarios show dramatic improvements in task success, demonstrating robust zero-shot performance. The framework offers a practical, scalable approach to integrating learning-based control with classical planning for reliable multi-robot coordination.
Abstract
Multi-robot navigation in cluttered environments presents fundamental challenges in balancing reactive collision avoidance with long-range goal achievement. When navigating through narrow passages or confined spaces, deadlocks frequently emerge that prevent agents from reaching their destinations, particularly when Reinforcement Learning (RL) control policies encounter novel configurations out of learning distribution. Existing RL-based approaches suffer from limited generalization capability in unseen environments. We propose a hybrid framework that seamlessly integrates RL-based reactive navigation with on-demand Multi-Agent Path Finding (MAPF) to explicitly resolve topological deadlocks. Our approach integrates a safety layer that monitors agent progress to detect deadlocks and, when detected, triggers a coordination controller for affected agents. The framework constructs globally feasible trajectories via MAPF and regulates waypoint progression to reduce inter-agent conflicts during navigation. Extensive evaluation on dense multi-agent benchmarks shows that our method boosts task completion from marginal to near-universal success, markedly reducing deadlocks and collisions. When integrated with hierarchical task planning, it enables coordinated navigation for heterogeneous robots, demonstrating that coupling reactive RL navigation with selective MAPF intervention yields a robust, zero-shot performance.
