Table of Contents
Fetching ...

Bounded-Memory Strategies in Partial-Information Games

Sougata Bose, Rasmus Ibsen-Jensen, Patrick Totzke

TL;DR

This work investigates the complexity of solving stochastic games with mean-payoff objectives under partial information when players are restricted to bounded-memory strategies. It establishes strong hardness results (NP-hardness for 1-player threshold values and coNP-hardness for 2-player zero-sum thresholds) and combines this with a constructive framework that places several decision and optimization problems in PSPACE and FNP^NP via a first-order logic of the reals encoding and iterative Markov-chain reductions. The authors introduce state-elimination and loop-elimination techniques to compute mean-payoff values of Markov chains and use FO(ℝ) encodings to derive polynomial-size witnesses for equilibria, enabling approximation of ε-optimal strategies and ε-Nash equilibria under memory bounds. The approach applies broadly to parity objectives and extends to multi-player concurrent and staying-in-a-set/quitting game variants, yielding practical approximation algorithms with rigorous complexity guarantees. Overall, the paper advances the understanding of bounded-memory solvability in imperfect-information settings and provides tractable pathways to approximate equilibria and values in otherwise intractable game classes.

Abstract

We study the computational complexity of solving stochastic games with mean-payoff objectives. Instead of identifying special classes in which simple strategies are sufficient to play $ε$-optimally, or form $ε$-Nash equilibria, we consider general partial-information multiplayer games and ask what can be achieved with (and against) finite-memory strategies up to a {given} bound on the memory. We show $NP$-hardness for approximating zero-sum values, already with respect to memoryless strategies and for 1-player reachability games. On the other hand, we provide upper bounds for solving games of any fixed number of players $k$. We show that one can decide in polynomial space if, for a given $k$-player game, $ε\ge 0$ and bound $b$, there exists an $ε$-Nash equilibrium in which all strategies use at most $b$ memory modes. For given $ε>0$, finding an $ε$-Nash equilibrium with respect to $b$-bounded strategies can be done in $FN[NP]$. Similarly for 2-player zero-sum games, finding a $b$-bounded strategy that, against all $b$-bounded opponent strategies, guarantees an outcome within $ε$ of a given value, can be done in $FNP[NP]$. Our constructions apply to parity objectives with minimal simplifications. Our results improve the status quo in several well-known special cases of games. In particular, for $2$-player zero-sum concurrent mean-payoff games, one can approximate ordinary zero-sum values (without restricting admissible strategies) in $FNP[NP]$.

Bounded-Memory Strategies in Partial-Information Games

TL;DR

This work investigates the complexity of solving stochastic games with mean-payoff objectives under partial information when players are restricted to bounded-memory strategies. It establishes strong hardness results (NP-hardness for 1-player threshold values and coNP-hardness for 2-player zero-sum thresholds) and combines this with a constructive framework that places several decision and optimization problems in PSPACE and FNP^NP via a first-order logic of the reals encoding and iterative Markov-chain reductions. The authors introduce state-elimination and loop-elimination techniques to compute mean-payoff values of Markov chains and use FO(ℝ) encodings to derive polynomial-size witnesses for equilibria, enabling approximation of ε-optimal strategies and ε-Nash equilibria under memory bounds. The approach applies broadly to parity objectives and extends to multi-player concurrent and staying-in-a-set/quitting game variants, yielding practical approximation algorithms with rigorous complexity guarantees. Overall, the paper advances the understanding of bounded-memory solvability in imperfect-information settings and provides tractable pathways to approximate equilibria and values in otherwise intractable game classes.

Abstract

We study the computational complexity of solving stochastic games with mean-payoff objectives. Instead of identifying special classes in which simple strategies are sufficient to play -optimally, or form -Nash equilibria, we consider general partial-information multiplayer games and ask what can be achieved with (and against) finite-memory strategies up to a {given} bound on the memory. We show -hardness for approximating zero-sum values, already with respect to memoryless strategies and for 1-player reachability games. On the other hand, we provide upper bounds for solving games of any fixed number of players . We show that one can decide in polynomial space if, for a given -player game, and bound , there exists an -Nash equilibrium in which all strategies use at most memory modes. For given , finding an -Nash equilibrium with respect to -bounded strategies can be done in . Similarly for 2-player zero-sum games, finding a -bounded strategy that, against all -bounded opponent strategies, guarantees an outcome within of a given value, can be done in . Our constructions apply to parity objectives with minimal simplifications. Our results improve the status quo in several well-known special cases of games. In particular, for -player zero-sum concurrent mean-payoff games, one can approximate ordinary zero-sum values (without restricting admissible strategies) in .
Paper Structure (34 sections, 33 theorems, 50 equations, 5 figures)

This paper contains 34 sections, 33 theorems, 50 equations, 5 figures.

Key Result

theorem 1

For every fixed $k\ge 1$ the following is in $\mathsf{PSPACE}$ .

Figures (5)

  • Figure 1: Stages $0<j\le m$ in game $G_\varphi$. In step $j\leq m$ the player receives signal $j$ and the pebble is moved out of some state $(*,j)$.
  • Figure 2: Elimination of state $n$: every length-two path via state $n$ in $M$ (on the left) is removed and the corresponding direct edge re-weighted in $M'$ (right). The expected reward and duration of going from $i$ to $j$ remains the same.
  • Figure 3: Elimination of loop $n$ in $M$ (left). In $M'$ (right), the corresponding edge has probability $0$ and all other edges from $n$ are re-weighted to match the expected duration and expected reward of paths that start by iterating the loop.
  • Figure 4: State elimination: On the left is (part of a) Markov chain $M$ before removing state $n$ and on the right is the corresponding part of Markov chain $M'$. In the middle is the intermediate step $M"$ as constructed in the proof of \ref{['lem:state-elim']}. The probability, and expected duration and reward, of moving from state $i$ to $j$ remains untouched.
  • Figure 5: Loop elimination: On the left is (part of the) Markov chain $M$ before eliminating the self-loop in vertex $n$; On the right is the resulting chain $M'$. In the middle is the intermediate chain $M"$ with the countably infinite edges between $n$ and auxiliary state $n'$. Taking the $\ell$th edge represents taking the loop $\ell$ times.

Theorems & Definitions (64)

  • definition 1
  • theorem 1
  • theorem 2
  • theorem 3
  • lemma 1
  • proof
  • proof : Proof of \ref{['thm:lower-bound']}
  • definition 2: State Elimination
  • lemma 2
  • definition 3: Loop Elimination
  • ...and 54 more