Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Tyler Becker; Zachary Sunberg

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Tyler Becker, Zachary Sunberg

TL;DR

A unifying framework that combines POMDP-inspired state distribution approximation and game-theoretic equilibrium search on information sets is proposed that enables planning in POSGs with very large state spaces, paving the way for reliable autonomous interaction in real-world physical environments and complementing multi-agent reinforcement learning.

Abstract

Many real-world decision problems involve the interaction of multiple self-interested agents with limited sensing ability. The partially observable stochastic game (POSG) provides a mathematical framework for modeling these problems, however solving a POSG requires difficult reasoning over two critical factors: (1) information revealed by partial observations and (2) decisions other agents make. In the single agent case, partially observable Markov decision process (POMDP) planning can efficiently address partial observability with particle filtering. In the multi-agent case, extensive form game solution methods account for other agent's decisions, but preclude belief approximation. We propose a unifying framework that combines POMDP-inspired state distribution approximation and game-theoretic equilibrium search on information sets. This paper lays a theoretical foundation for the approach by bounding errors due to belief approximation, and empirically demonstrates effectiveness with a numerical example. The new approach enables planning in POSGs with very large state spaces, paving the way for reliable autonomous interaction in real-world physical environments and complementing multi-agent reinforcement learning.

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

TL;DR

Abstract

Paper Structure (21 sections, 7 theorems, 45 equations, 3 figures)

This paper contains 21 sections, 7 theorems, 45 equations, 3 figures.

Introduction
Background
Partially observable stochastic games
Particle filtering and tree search for POMDPs
Imperfect information extensive form games
Limitations of POMDP approaches
Limitations of EFG approaches
Summary of Improvements
Conditional distribution information set trees
Joint conditional distribution trees
Combining joint conditional distributions with information sets
Finding approximate Nash equilibria on CDITs for zero sum games
Convergence guarantees for approximate Nash equilibria on CDITs
Suboptimality of Approximate Games
Particle CDIT Policy Evaluation Error
...and 6 more sections

Key Result

lemma 1

In a game with payoff matrices $A$, the deviation incentive for Player $i$ from a policy $\pi$ can be upper bounded by and

Figures (3)

Figure 1: Illustration of a CDIT (left) and its particle approximation (right) for a POSG with $\mathcal{A}^1$=$\mathcal{O}^2$={1,2}, $\mathcal{A}^2$=$\mathcal{O}^1$={1}.
Figure 2: Continuous tag policies marginalized over observations
Figure 3: Continuous Tag exploitability; $3\sigma$ standard error bounds shaded

Theorems & Definitions (7)

lemma 1
theorem 1
lemma 2
lemma 3
theorem 2
theorem 3
theorem 4

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

TL;DR

Abstract

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (7)