Table of Contents
Fetching ...

HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent Pathfinding

Huijie Tang, Federico Berto, Zihan Ma, Chuanbo Hua, Kyuree Ahn, Jinkyoo Park

TL;DR

HiMAP tackles large-scale MAPF by learning policies through imitation of heuristic expert solutions in a decentralized framework. It employs a PRIMAL-like network trained with expert actions from heuristic solvers and enhances planning via inference techniques such as history-based pruning, adaptive exploration, obstacle caching for completed agents, and conflict correction. The paper demonstrates that imitation-learning-only MAPF with these inferences can achieve competitive success rates and scalability while offering simpler training compared to reinforcement learning approaches. It also discusses limitations and outlines future work involving hybrid policies, domain-knowledge integration, and agent communication to further improve performance.

Abstract

Large-scale multi-agent pathfinding (MAPF) presents significant challenges in several areas. As systems grow in complexity with a multitude of autonomous agents operating simultaneously, efficient and collision-free coordination becomes paramount. Traditional algorithms often fall short in scalability, especially in intricate scenarios. Reinforcement Learning (RL) has shown potential to address the intricacies of MAPF; however, it has also been shown to struggle with scalability, demanding intricate implementation, lengthy training, and often exhibiting unstable convergence, limiting its practical application. In this paper, we introduce Heuristics-Informed Multi-Agent Pathfinding (HiMAP), a novel scalable approach that employs imitation learning with heuristic guidance in a decentralized manner. We train on small-scale instances using a heuristic policy as a teacher that maps each single agent observation information to an action probability distribution. During pathfinding, we adopt several inference techniques to improve performance. With a simple training scheme and implementation, HiMAP demonstrates competitive results in terms of success rate and scalability in the field of imitation-learning-only MAPF, showing the potential of imitation-learning-only MAPF equipped with inference techniques.

HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent Pathfinding

TL;DR

HiMAP tackles large-scale MAPF by learning policies through imitation of heuristic expert solutions in a decentralized framework. It employs a PRIMAL-like network trained with expert actions from heuristic solvers and enhances planning via inference techniques such as history-based pruning, adaptive exploration, obstacle caching for completed agents, and conflict correction. The paper demonstrates that imitation-learning-only MAPF with these inferences can achieve competitive success rates and scalability while offering simpler training compared to reinforcement learning approaches. It also discusses limitations and outlines future work involving hybrid policies, domain-knowledge integration, and agent communication to further improve performance.

Abstract

Large-scale multi-agent pathfinding (MAPF) presents significant challenges in several areas. As systems grow in complexity with a multitude of autonomous agents operating simultaneously, efficient and collision-free coordination becomes paramount. Traditional algorithms often fall short in scalability, especially in intricate scenarios. Reinforcement Learning (RL) has shown potential to address the intricacies of MAPF; however, it has also been shown to struggle with scalability, demanding intricate implementation, lengthy training, and often exhibiting unstable convergence, limiting its practical application. In this paper, we introduce Heuristics-Informed Multi-Agent Pathfinding (HiMAP), a novel scalable approach that employs imitation learning with heuristic guidance in a decentralized manner. We train on small-scale instances using a heuristic policy as a teacher that maps each single agent observation information to an action probability distribution. During pathfinding, we adopt several inference techniques to improve performance. With a simple training scheme and implementation, HiMAP demonstrates competitive results in terms of success rate and scalability in the field of imitation-learning-only MAPF, showing the potential of imitation-learning-only MAPF equipped with inference techniques.
Paper Structure (6 sections, 1 figure)

This paper contains 6 sections, 1 figure.

Figures (1)

  • Figure 1: Success rate of HiMAP and ablated models. Baseline: HiMAP without preventing re-visit (No History) and treating completed agents as obstacles (No TCAO).