Table of Contents
Fetching ...

Factored Online Planning in Many-Agent POMDPs

Maris F. L. Galesloot, Thiago D. Simão, Sebastian Junges, Nils Jansen

TL;DR

This work tackles the dual challenges of value estimation and belief estimation in online planning for many-agent MPOMDPs. It introduces four algorithms that combine coordination-graph based value factorization with weighted particle filtering to scale both value and belief estimates, including FS-W-POMCP, FT-W-POMCP, FS-PFT, and FT-PFT. Empirical results demonstrate that using structured value decomposition and weighted offline beliefs yields improvements over state-of-the-art baselines, enabling planning with dozens of agents in benchmarks like FireFightingGraph and MARS. The proposed methods offer a scalable pathway for centralized planning in dynamic, multi-agent partially observable environments, with potential extensions to continuous spaces and learned coordination graphs.

Abstract

In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation methods have been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to settings with many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.

Factored Online Planning in Many-Agent POMDPs

TL;DR

This work tackles the dual challenges of value estimation and belief estimation in online planning for many-agent MPOMDPs. It introduces four algorithms that combine coordination-graph based value factorization with weighted particle filtering to scale both value and belief estimates, including FS-W-POMCP, FT-W-POMCP, FS-PFT, and FT-PFT. Empirical results demonstrate that using structured value decomposition and weighted offline beliefs yields improvements over state-of-the-art baselines, enabling planning with dozens of agents in benchmarks like FireFightingGraph and MARS. The proposed methods offer a scalable pathway for centralized planning in dynamic, multi-agent partially observable environments, with potential extensions to continuous spaces and learned coordination graphs.

Abstract

In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation methods have been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to settings with many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.
Paper Structure (67 sections, 1 theorem, 9 equations, 13 figures, 2 tables, 6 algorithms)

This paper contains 67 sections, 1 theorem, 9 equations, 13 figures, 2 tables, 6 algorithms.

Key Result

Theorem 1

If for edges with overlap $e$, $e'$, and any two $\vec{a}_{e'\cap e}, \vec{a}'_{e'\cap e} \in \mathcal{A}_{e'\setminus e}$, with $\mathcal{A}_{e'\setminus e}$ the set of actions with overlap, the true value function $Q$ satisfies: then MoE optimization will return an $\epsilon$-optimal joint action in the limit.

Figures (13)

  • Figure 1: Performance comparison for POMCP variants with (solid) and without (dotted) weighted particle filtering.
  • Figure 2: Comparison between Sparse-PFT (dotted) and our PFT variants with value factorization (solid).
  • Figure 3: Performance comparison for our FS/FT-W-POMCP (dotted) and FS/FT-PFT (solid) methods.
  • Figure 4: Comparison across algorithm variants on FireFightingGraph for both Max-Plus and Variable Elimination.
  • Figure 5: Performance on two smaller maps of Multi-Agent RockSample, comparing W-POMCP variants to POMCP variants as in \ref{['fig:pomcpwf']} of the main paper.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Definition 1: MPOMDP
  • Definition 2: PB-MMDP
  • Definition 3: PB-MMDP
  • Theorem 1
  • proof