Table of Contents
Fetching ...

An $ε$-Optimal Sequential Approach for Solving zs-POSGs

Jilles S. Dibangoye, Matthia Sabatelli, Erwan C. Escudie

TL;DR

This work rigorously recasts the simultaneous interaction as a sequential decision process via the principle of separation, and introduces distinct sufficient statistics for valuation and execution, the sequential occupancy state and the private occupancy family, which reveal a latent geometry in the optimal value function.

Abstract

While recent reductions of zero-sum partially observable stochastic games (zs-POSGs) to transition-independent stochastic games (TI-SGs) theoretically admit dynamic programming, practical solutions remain stifled by the inherent non-linearity and exponential complexity of the simultaneous minimax backup. In this work, we surmount this computational barrier by rigorously recasting the simultaneous interaction as a sequential decision process via the principle of separation. We introduce distinct sufficient statistics for valuation and execution, the sequential occupancy state and the private occupancy family, which reveal a latent geometry in the optimal value function. This structural insight allows us to linearise the backup operator, reducing the update complexity from exponential to polynomial while enabling the direct extraction of safe policies without heuristic bookkeeping. Experimental results demonstrate that algorithms leveraging this sequential framework significantly outperform state-of-the-art methods, effectively rendering previously intractable domains solvable.

An $ε$-Optimal Sequential Approach for Solving zs-POSGs

TL;DR

This work rigorously recasts the simultaneous interaction as a sequential decision process via the principle of separation, and introduces distinct sufficient statistics for valuation and execution, the sequential occupancy state and the private occupancy family, which reveal a latent geometry in the optimal value function.

Abstract

While recent reductions of zero-sum partially observable stochastic games (zs-POSGs) to transition-independent stochastic games (TI-SGs) theoretically admit dynamic programming, practical solutions remain stifled by the inherent non-linearity and exponential complexity of the simultaneous minimax backup. In this work, we surmount this computational barrier by rigorously recasting the simultaneous interaction as a sequential decision process via the principle of separation. We introduce distinct sufficient statistics for valuation and execution, the sequential occupancy state and the private occupancy family, which reveal a latent geometry in the optimal value function. This structural insight allows us to linearise the backup operator, reducing the update complexity from exponential to polynomial while enabling the direct extraction of safe policies without heuristic bookkeeping. Experimental results demonstrate that algorithms leveraging this sequential framework significantly outperform state-of-the-art methods, effectively rendering previously intractable domains solvable.
Paper Structure (64 sections, 12 theorems, 43 equations, 6 figures, 4 tables, 4 algorithms)

This paper contains 64 sections, 12 theorems, 43 equations, 6 figures, 4 tables, 4 algorithms.

Key Result

Theorem 3.3

For any transient sub-stage state $x_{\textcolor{sthlmRed}{i}}\in\mathcal{X}_{\mathrm{seq}}$, where $\gamma_{\textcolor{sthlmRed}{1}}=1$ and $\gamma_{\textcolor{sthlmRed}{2}}=\gamma$.

Figures (6)

  • Figure 1: The influence diagram for the sequential occupancy game from the sequential central planner.
  • Figure 2: The Max-of-Concave Geometry. This diagram visualizes the value function structure projected onto the probability simplex. The horizontal axis represents the space of normalized local beliefs$\tilde{b}_{h_{\textcolor{sthlmRed}{2}}} \doteq b_{h_{\textcolor{sthlmRed}{2}}} / \|b_{h_{\textcolor{sthlmRed}{2}}}\|_1$. The thick highlighted curve corresponds to the upper envelope (optimism) over a family of concave functions (pessimism).
  • Figure 3: PBVI on Adversarial Tiger ($\ell{=}5$): exploitability, value, cumulative runtime, and sample count over iterations, averaged over random seeds. Green: simultaneous; red: sequential.
  • Figure 4: Sequential Transition-Independent Stochastic Game. The diagram illustrates the interleaved dynamics extending over stages $t_0$ and $t_1$. At each sub-stage $(1,t)$, Player 1 is active, updating their local state $\mathbf{o}_{\textcolor{sthlmRed}{1}}$ and driving the transition $x_{\textcolor{sthlmRed}{1},t} \to x_{\textcolor{sthlmRed}{2},t}$. At sub-stage $(2,t)$, Player 2 acts, updating $\mathbf{o}_{\textcolor{sthlmRed}{2}}$ and completing the stage transition $x_{\textcolor{sthlmRed}{2},t} \to x_{\textcolor{sthlmRed}{1},t+1}$, while generating the sequential reward $\rho_{\mathrm{seq}}$.
  • Figure 5: Top: Exploitability of PBVI across iterations on different benchmarks ($\ell=5$). The green curve corresponds to the simultaneous variant, while the red curve represents the sequential variant. Bottom: Number of points sampled at each iteration by simultaneous and sequential variants.
  • ...and 1 more figures

Theorems & Definitions (28)

  • Definition 3.1
  • Definition 3.2: Sequential Occupancy Game
  • Theorem 3.3: Sequential Optimality Equations
  • Theorem 3.4: Sufficiency
  • Theorem 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Theorem 4.4
  • Theorem 5.1: Sequential lossless reduction
  • Lemma 1.1
  • ...and 18 more