Approximate Dec-POMDP Solving Using Multi-Agent A*

Wietze Koops; Sebastian Junges; Nils Jansen

Approximate Dec-POMDP Solving Using Multi-Agent A*

Wietze Koops, Sebastian Junges, Nils Jansen

TL;DR

This work tackles finite-horizon Dec-POMDPs by developing an approximate, scalable A*-based framework. It combines clustered sliding-window memory, queue pruning, and loose yet scalable heuristics (including a novel terminal-reward upper-bound strategy) to find high-quality policies while producing tight upper bounds for long horizons. The proposed PF-MAA$^*$ and TR-MAA$^*$ achieve competitive or superior policy quality across standard benchmarks and provide scalable upper bounds up to horizon $h=100$, with BoxPushing policies within 1% of their bounds. The methodology significantly extends the practical horizon for Dec-POMDP planning and offers robust upper bounds, enabling more reliable decision-making in multi-agent, partially observable domains.

Abstract

We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.

Approximate Dec-POMDP Solving Using Multi-Agent A*

TL;DR

and TR-MAA

achieve competitive or superior policy quality across standard benchmarks and provide scalable upper bounds up to horizon

, with BoxPushing policies within 1% of their bounds. The methodology significantly extends the practical horizon for Dec-POMDP planning and offers robust upper bounds, enabling more reliable decision-making in multi-agent, partially observable domains.

Abstract

Paper Structure (71 sections, 7 theorems, 29 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 71 sections, 7 theorems, 29 equations, 3 figures, 7 tables, 1 algorithm.

Introduction
Small-step MAA$^*$.
Lower bounds by finding policies.
Proving upper bounds.
Technical ingredient 1: Clustering with sliding-window memory.
Technical ingredient 2: Pruning the queue.
Technical ingredient 3: Loose heuristics.
Technical ingredient 4: Scalable and tight heuristics for upper bounds.
Contributions.
Problem Statement
Sliding window memory.
Clustering
Cluster policies.
Clustered Sliding Window Memory
Probability-based clustering.
...and 56 more sections

Key Result

Lemma 1

If a clustering is incremental, coarser than sliding $k$-window memory and finer than belief-equivalence, it is lossless w.r.t. sliding $k$-window memory.

Figures (3)

Figure 1: Revealing the state for a joint belief $b$ at stage $r$ over three states $s_1$, $s_2$, $s_3$. On the left the state is revealed at stage $r$, on the right it is revealed at stage $r+1$. In the latter case, the three policies are forced to take same action in stage $0$.
Figure 2: Schematic call graph for PF-MAA$^*$, for $r=3$. A dotted line indicates that a result is provided.
Figure 3: Schematic call graph for TR-MAA$^*$, for $r=5$. A dotted line indicates that a result is provided.

Theorems & Definitions (38)

Definition 1: Dec-POMDP
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Lemma 1
Definition 7
Definition 8
Lemma 2
...and 28 more

Approximate Dec-POMDP Solving Using Multi-Agent A*

TL;DR

Abstract

Approximate Dec-POMDP Solving Using Multi-Agent A*

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (38)