Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding
Huijie Tang, Federico Berto, Jinkyoo Park
TL;DR
This paper tackles multi-agent pathfinding (MAPF) under challenging dense-obstacle conditions by introducing Ensembling Prioritized Hybrid Policies (EPH). EPH combines an enhanced selective graph-convolution communication framework with a Double Duelling DQN training regime and a suite of inference-time strategies, including hybrid expert guidance, priority-based conflict resolution, advanced deadlock escape, and solver ensembling. The approach is evaluated on random and structured maps, where EPH demonstrates competitive or superior performance against state-of-the-art neural MAPF solvers and some classical heuristics, often achieving higher success rates and shorter makespans. The results highlight the practical potential of combining richer local communication with principled inference-time diversification for robust, scalable MAPF in real-world multi-agent systems.
Abstract
Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. However, existing works still struggle in structured environments with high obstacle density and a high number of agents. To further improve the performance of the communication-based MARL-MAPF solvers, we propose a new method, Ensembling Prioritized Hybrid Policies (EPH). We first propose a selective communication block to gather richer information for better agent coordination within multi-agent environments and train the model with a Q learning-based algorithm. We further introduce three advanced inference strategies aimed at bolstering performance during the execution phase. First, we hybridize the neural policy with single-agent expert guidance for navigating conflict-free zones. Secondly, we propose Q value-based methods for prioritized resolution of conflicts as well as deadlock situations. Finally, we introduce a robust ensemble method that can efficiently collect the best out of multiple possible solutions. We empirically evaluate EPH in complex multi-agent environments and demonstrate competitive performance against state-of-the-art neural methods for MAPF. We open-source our code at https://github.com/ai4co/eph-mapf.
