Table of Contents
Fetching ...

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Tyler Becker, Zachary Sunberg

TL;DR

A unifying framework that combines POMDP-inspired state distribution approximation and game-theoretic equilibrium search on information sets is proposed that enables planning in POSGs with very large state spaces, paving the way for reliable autonomous interaction in real-world physical environments and complementing multi-agent reinforcement learning.

Abstract

Many real-world decision problems involve the interaction of multiple self-interested agents with limited sensing ability. The partially observable stochastic game (POSG) provides a mathematical framework for modeling these problems, however solving a POSG requires difficult reasoning over two critical factors: (1) information revealed by partial observations and (2) decisions other agents make. In the single agent case, partially observable Markov decision process (POMDP) planning can efficiently address partial observability with particle filtering. In the multi-agent case, extensive form game solution methods account for other agent's decisions, but preclude belief approximation. We propose a unifying framework that combines POMDP-inspired state distribution approximation and game-theoretic equilibrium search on information sets. This paper lays a theoretical foundation for the approach by bounding errors due to belief approximation, and empirically demonstrates effectiveness with a numerical example. The new approach enables planning in POSGs with very large state spaces, paving the way for reliable autonomous interaction in real-world physical environments and complementing multi-agent reinforcement learning.

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

TL;DR

A unifying framework that combines POMDP-inspired state distribution approximation and game-theoretic equilibrium search on information sets is proposed that enables planning in POSGs with very large state spaces, paving the way for reliable autonomous interaction in real-world physical environments and complementing multi-agent reinforcement learning.

Abstract

Many real-world decision problems involve the interaction of multiple self-interested agents with limited sensing ability. The partially observable stochastic game (POSG) provides a mathematical framework for modeling these problems, however solving a POSG requires difficult reasoning over two critical factors: (1) information revealed by partial observations and (2) decisions other agents make. In the single agent case, partially observable Markov decision process (POMDP) planning can efficiently address partial observability with particle filtering. In the multi-agent case, extensive form game solution methods account for other agent's decisions, but preclude belief approximation. We propose a unifying framework that combines POMDP-inspired state distribution approximation and game-theoretic equilibrium search on information sets. This paper lays a theoretical foundation for the approach by bounding errors due to belief approximation, and empirically demonstrates effectiveness with a numerical example. The new approach enables planning in POSGs with very large state spaces, paving the way for reliable autonomous interaction in real-world physical environments and complementing multi-agent reinforcement learning.
Paper Structure (21 sections, 7 theorems, 45 equations, 3 figures)

This paper contains 21 sections, 7 theorems, 45 equations, 3 figures.

Key Result

lemma 1

In a game with payoff matrices $A$, the deviation incentive for Player $i$ from a policy $\pi$ can be upper bounded by and

Figures (3)

  • Figure 1: Illustration of a CDIT (left) and its particle approximation (right) for a POSG with $\mathcal{A}^1$=$\mathcal{O}^2$={1,2}, $\mathcal{A}^2$=$\mathcal{O}^1$={1}.
  • Figure 2: Continuous tag policies marginalized over observations
  • Figure 3: Continuous Tag exploitability; $3\sigma$ standard error bounds shaded

Theorems & Definitions (7)

  • lemma 1
  • theorem 1
  • lemma 2
  • lemma 3
  • theorem 2
  • theorem 3
  • theorem 4