Table of Contents
Fetching ...

Uniform Value and Decidability in Ergodic Blind Stochastic Games

Krishnendu Chatterjee, David Lurie, Raimundo Saona, Bruno Ziliotto

TL;DR

This work investigates uniform value existence and computability in blind stochastic games, focusing on ergodic subclasses where players never observe the state. The authors develop a rigorous matrix-ergodicity framework, showing that forward products of transition matrices become indistinguishable across initial states, which enables a finite-state abstract game that closely tracks the belief dynamics. They prove that every ergodic blind stochastic game has a uniform value and that approximating this value is decidable (with a 2-EXPSPACE upper bound), while computing the exact value is in general undecidable. Additionally, the uniform value is independent of the initial belief, and the results highlight a sharp boundary between the decidable approximation problem in ergodic blind settings and the undecidability of exact computation, with implications for POMDP-like models and belief-space analyses.

Abstract

We study a class of two-player zero-sum stochastic games known as \textit{blind stochastic games}, where players neither observe the state nor receive any information about it during the game. A central concept for analyzing long-duration stochastic games is the \textit{uniform value}. A game has a uniform value $v$ if for every $\varepsilon>0$, Player 1 (resp., Player 2) has a strategy such that, for all sufficiently large $n$, his average payoff over $n$ stages is at least $v-\varepsilon$ (resp., at most $v+\varepsilon$). Prior work has shown that the uniform value may not exist in general blind stochastic games. To address this, we introduce a subclass called \textit{ergodic blind stochastic games}, defined by imposing an ergodicity condition on the state transitions. For this subclass, we prove the existence of the uniform value and provide an algorithm to approximate it, establishing the \textit{decidability} of the approximation problem. Notably, this decidability result is novel even in the single-player setting of Partially Observable Markov Decision Processes (POMDPs). Furthermore, we show that no algorithm can compute the uniform value exactly, emphasizing the tightness of our result. Finally, we establish that the uniform value is independent of the initial belief.

Uniform Value and Decidability in Ergodic Blind Stochastic Games

TL;DR

This work investigates uniform value existence and computability in blind stochastic games, focusing on ergodic subclasses where players never observe the state. The authors develop a rigorous matrix-ergodicity framework, showing that forward products of transition matrices become indistinguishable across initial states, which enables a finite-state abstract game that closely tracks the belief dynamics. They prove that every ergodic blind stochastic game has a uniform value and that approximating this value is decidable (with a 2-EXPSPACE upper bound), while computing the exact value is in general undecidable. Additionally, the uniform value is independent of the initial belief, and the results highlight a sharp boundary between the decidable approximation problem in ergodic blind settings and the undecidability of exact computation, with implications for POMDP-like models and belief-space analyses.

Abstract

We study a class of two-player zero-sum stochastic games known as \textit{blind stochastic games}, where players neither observe the state nor receive any information about it during the game. A central concept for analyzing long-duration stochastic games is the \textit{uniform value}. A game has a uniform value if for every , Player 1 (resp., Player 2) has a strategy such that, for all sufficiently large , his average payoff over stages is at least (resp., at most ). Prior work has shown that the uniform value may not exist in general blind stochastic games. To address this, we introduce a subclass called \textit{ergodic blind stochastic games}, defined by imposing an ergodicity condition on the state transitions. For this subclass, we prove the existence of the uniform value and provide an algorithm to approximate it, establishing the \textit{decidability} of the approximation problem. Notably, this decidability result is novel even in the single-player setting of Partially Observable Markov Decision Processes (POMDPs). Furthermore, we show that no algorithm can compute the uniform value exactly, emphasizing the tightness of our result. Finally, we establish that the uniform value is independent of the initial belief.
Paper Structure (29 sections, 9 theorems, 53 equations, 2 tables, 1 algorithm)

This paper contains 29 sections, 9 theorems, 53 equations, 2 tables, 1 algorithm.

Key Result

Theorem 3.5

All ergodic blind stochastic games have a uniform value. Moreover, the decision version of approximating the uniform value for the class of ergodic blind stochastic games is decidable.

Theorems & Definitions (32)

  • Definition 2.1: Uniform Value
  • Definition 2.2: Decision Version of Computing the Uniform Value
  • Definition 2.3: Decision Version of Approximating the Uniform Value
  • Definition 2.4: $m$-Stage History
  • Definition 2.5: $m$-Stage Belief
  • Definition 3.1: Ergodicity
  • Remark 3.1
  • Definition 3.2: Coefficient of Ergodicity
  • Definition 3.3: Ergodic blind stochastic game
  • Remark 3.2
  • ...and 22 more