Approximating the Uniform Value in Hidden Stochastic Games with Doeblin Conditions

Krishnendu Chatterjee; David Lurie; Raimundo Saona; Bruno Ziliotto

Approximating the Uniform Value in Hidden Stochastic Games with Doeblin Conditions

Krishnendu Chatterjee, David Lurie, Raimundo Saona, Bruno Ziliotto

TL;DR

The existence of the uniform value is proved and an algorithm to approximate it is provided and it is shown that, in the hidden setting, ergodicity does not guarantee the Doeblin condition.

Abstract

In \emph{zero-sum two-player hidden stochastic games}, players observe partial information about the state. We address: $(i)$ the existence of the \emph{uniform value}, i.e., a limiting average payoff that both players can guarantee for sufficiently long durations, and $(ii)$ the existence of an algorithm to approximate it. Previous work shows that, in the general case, the uniform value may fail to exist, and, even when it does, there need not exist an algorithm to compute or approximate it. Therefore, we consider the \emph{Doeblin condition} in hidden stochastic games, requiring that, after a sufficiently long time, the posterior beliefs have a uniformly positive probability of resetting to one of finitely many neighborhoods in the belief space. We prove the existence of the uniform value and provide an algorithm to approximate it. We identify sufficient conditions, namely \emph{ergodicity} in the blind setting (when the signal is uninformative) and \emph{primitivity} in the hidden setting (when there are multiple signals). Moreover, we show that, in the hidden setting, ergodicity does not guarantee the Doeblin condition. Our results are new even for the one-player setting, i.e., partially observable Markov decision processes.

Approximating the Uniform Value in Hidden Stochastic Games with Doeblin Conditions

TL;DR

The existence of the uniform value is proved and an algorithm to approximate it is provided and it is shown that, in the hidden setting, ergodicity does not guarantee the Doeblin condition.

Abstract

In \emph{zero-sum two-player hidden stochastic games}, players observe partial information about the state. We address:

the existence of the \emph{uniform value}, i.e., a limiting average payoff that both players can guarantee for sufficiently long durations, and

the existence of an algorithm to approximate it. Previous work shows that, in the general case, the uniform value may fail to exist, and, even when it does, there need not exist an algorithm to compute or approximate it. Therefore, we consider the \emph{Doeblin condition} in hidden stochastic games, requiring that, after a sufficiently long time, the posterior beliefs have a uniformly positive probability of resetting to one of finitely many neighborhoods in the belief space. We prove the existence of the uniform value and provide an algorithm to approximate it. We identify sufficient conditions, namely \emph{ergodicity} in the blind setting (when the signal is uninformative) and \emph{primitivity} in the hidden setting (when there are multiple signals). Moreover, we show that, in the hidden setting, ergodicity does not guarantee the Doeblin condition. Our results are new even for the one-player setting, i.e., partially observable Markov decision processes.

Paper Structure (71 sections, 8 theorems, 81 equations, 1 figure, 1 algorithm)

This paper contains 71 sections, 8 theorems, 81 equations, 1 figure, 1 algorithm.

Introduction
Contribution
Technique
Related Literature
Novelty
Outline
Notation
Preliminaries
Framework
Game
Related Models
Matrices
Dynamic
History
Strategy
...and 56 more sections

Key Result

Theorem 3.2

For every Doeblin hidden stochastic game:

Figures (1)

Figure 1: State transition diagram of $\Gamma$. The triangle facing up are states controlled by Player $1$ while the triangle facing down are states controlled by Player $2$.

Theorems & Definitions (22)

Definition 2.1: Decision version of computing the uniform value
Definition 2.2: Decision version of approximating the uniform value
Definition 3.1: Doeblin hidden stochastic game
Theorem 3.2
Corollary 3.3
Definition 3.4: Ergodic blind stochastic game
Theorem 3.5
Definition 3.6: Primitive hidden stochastic game
Theorem 3.7
Lemma 4.1
...and 12 more

Approximating the Uniform Value in Hidden Stochastic Games with Doeblin Conditions

TL;DR

Abstract

Approximating the Uniform Value in Hidden Stochastic Games with Doeblin Conditions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (22)