Bounded-Memory Strategies in Partial-Information Games

Sougata Bose; Rasmus Ibsen-Jensen; Patrick Totzke

Bounded-Memory Strategies in Partial-Information Games

Sougata Bose, Rasmus Ibsen-Jensen, Patrick Totzke

TL;DR

This work investigates the complexity of solving stochastic games with mean-payoff objectives under partial information when players are restricted to bounded-memory strategies. It establishes strong hardness results (NP-hardness for 1-player threshold values and coNP-hardness for 2-player zero-sum thresholds) and combines this with a constructive framework that places several decision and optimization problems in PSPACE and FNP^NP via a first-order logic of the reals encoding and iterative Markov-chain reductions. The authors introduce state-elimination and loop-elimination techniques to compute mean-payoff values of Markov chains and use FO(ℝ) encodings to derive polynomial-size witnesses for equilibria, enabling approximation of ε-optimal strategies and ε-Nash equilibria under memory bounds. The approach applies broadly to parity objectives and extends to multi-player concurrent and staying-in-a-set/quitting game variants, yielding practical approximation algorithms with rigorous complexity guarantees. Overall, the paper advances the understanding of bounded-memory solvability in imperfect-information settings and provides tractable pathways to approximate equilibria and values in otherwise intractable game classes.

Abstract

We study the computational complexity of solving stochastic games with mean-payoff objectives. Instead of identifying special classes in which simple strategies are sufficient to play $ε$-optimally, or form $ε$-Nash equilibria, we consider general partial-information multiplayer games and ask what can be achieved with (and against) finite-memory strategies up to a {given} bound on the memory. We show $NP$-hardness for approximating zero-sum values, already with respect to memoryless strategies and for 1-player reachability games. On the other hand, we provide upper bounds for solving games of any fixed number of players $k$. We show that one can decide in polynomial space if, for a given $k$-player game, $ε\ge 0$ and bound $b$, there exists an $ε$-Nash equilibrium in which all strategies use at most $b$ memory modes. For given $ε>0$, finding an $ε$-Nash equilibrium with respect to $b$-bounded strategies can be done in $FN[NP]$. Similarly for 2-player zero-sum games, finding a $b$-bounded strategy that, against all $b$-bounded opponent strategies, guarantees an outcome within $ε$ of a given value, can be done in $FNP[NP]$. Our constructions apply to parity objectives with minimal simplifications. Our results improve the status quo in several well-known special cases of games. In particular, for $2$-player zero-sum concurrent mean-payoff games, one can approximate ordinary zero-sum values (without restricting admissible strategies) in $FNP[NP]$.

Bounded-Memory Strategies in Partial-Information Games

TL;DR

Abstract

We study the computational complexity of solving stochastic games with mean-payoff objectives. Instead of identifying special classes in which simple strategies are sufficient to play

-optimally, or form

-Nash equilibria, we consider general partial-information multiplayer games and ask what can be achieved with (and against) finite-memory strategies up to a {given} bound on the memory. We show

-hardness for approximating zero-sum values, already with respect to memoryless strategies and for 1-player reachability games. On the other hand, we provide upper bounds for solving games of any fixed number of players

. We show that one can decide in polynomial space if, for a given

-player game,

and bound

, there exists an

-Nash equilibrium in which all strategies use at most

memory modes. For given

, finding an

-Nash equilibrium with respect to

-bounded strategies can be done in

. Similarly for 2-player zero-sum games, finding a

-bounded strategy that, against all

-bounded opponent strategies, guarantees an outcome within

of a given value, can be done in

. Our constructions apply to parity objectives with minimal simplifications. Our results improve the status quo in several well-known special cases of games. In particular, for

-player zero-sum concurrent mean-payoff games, one can approximate ordinary zero-sum values (without restricting admissible strategies) in

Paper Structure (34 sections, 33 theorems, 50 equations, 5 figures)

This paper contains 34 sections, 33 theorems, 50 equations, 5 figures.

Introduction
Notations
Overview of Our Results
Lower Bound
Mean-Payoff Values for Markov Chains
Expressibility in $\mathrm{FO}(\mathbb{R})$
Step 1: Guessing strategies and strategy profiles
Approximation Algorithms
Floating point representations
Polynomial witnesses
Computing Approximations for Markov chains
Proof of Theorem 7.3
First algorithm (for (1))
Second algorithm (for (2))
Running time and correctness of the algorithms
...and 19 more sections

Key Result

theorem 1

For every fixed $k\ge 1$ the following is in $\mathsf{PSPACE}$ .

Figures (5)

Figure 1: Stages $0<j\le m$ in game $G_\varphi$. In step $j\leq m$ the player receives signal $j$ and the pebble is moved out of some state $(*,j)$.
Figure 2: Elimination of state $n$: every length-two path via state $n$ in $M$ (on the left) is removed and the corresponding direct edge re-weighted in $M'$ (right). The expected reward and duration of going from $i$ to $j$ remains the same.
Figure 3: Elimination of loop $n$ in $M$ (left). In $M'$ (right), the corresponding edge has probability $0$ and all other edges from $n$ are re-weighted to match the expected duration and expected reward of paths that start by iterating the loop.
Figure 4: State elimination: On the left is (part of a) Markov chain $M$ before removing state $n$ and on the right is the corresponding part of Markov chain $M'$. In the middle is the intermediate step $M"$ as constructed in the proof of \ref{['lem:state-elim']}. The probability, and expected duration and reward, of moving from state $i$ to $j$ remains untouched.
Figure 5: Loop elimination: On the left is (part of the) Markov chain $M$ before eliminating the self-loop in vertex $n$; On the right is the resulting chain $M'$. In the middle is the intermediate chain $M"$ with the countably infinite edges between $n$ and auxiliary state $n'$. Taking the $\ell$th edge represents taking the loop $\ell$ times.

Theorems & Definitions (64)

definition 1
theorem 1
theorem 2
theorem 3
lemma 1
proof
proof : Proof of \ref{['thm:lower-bound']}
definition 2: State Elimination
lemma 2
definition 3: Loop Elimination
...and 54 more

Bounded-Memory Strategies in Partial-Information Games

TL;DR

Abstract

Bounded-Memory Strategies in Partial-Information Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (64)