Approximate Dec-POMDP Solving Using Multi-Agent A*
Wietze Koops, Sebastian Junges, Nils Jansen
TL;DR
This work tackles finite-horizon Dec-POMDPs by developing an approximate, scalable A*-based framework. It combines clustered sliding-window memory, queue pruning, and loose yet scalable heuristics (including a novel terminal-reward upper-bound strategy) to find high-quality policies while producing tight upper bounds for long horizons. The proposed PF-MAA$^*$ and TR-MAA$^*$ achieve competitive or superior policy quality across standard benchmarks and provide scalable upper bounds up to horizon $h=100$, with BoxPushing policies within 1% of their bounds. The methodology significantly extends the practical horizon for Dec-POMDP planning and offers robust upper bounds, enabling more reliable decision-making in multi-agent, partially observable domains.
Abstract
We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.
