Table of Contents
Fetching ...

Playing against a stationary opponent

Julien Grand-Clément, Nicolas Vieille

TL;DR

The paper addresses how simple strategies can approximate the discounted value in zero-sum stochastic games when the opponent is restricted to stationary strategies, with implications for robust MDPs. It proves a mixed set of results: in product absorbing games, Blackwell $\varepsilon$-optimal strategies can always be realized by a two-state autonomous automaton, even though Markovian strategies may fail to be optimal; in contrast, general absorbing games can admit no blind Blackwell $\varepsilon$-optimal strategy. The analysis uses concrete game constructions (Big Match and variants) to separate cases where Markovian, blind, and automaton-based strategies differ in power, and it gives explicit formulas for limiting payoffs under stationary responses. The findings highlight a sharp contrast between absorbing games and generalized Big Match games and offer new insights for designing robust MDP policies that are simple yet near-optimal against stationary disturbances.

Abstract

This paper investigates properties of Blackwell $ε$-optimal strategies in zero-sum stochastic games when the adversary is restricted to stationary strategies, motivated by applications to robust Markov decision processes. For a class of absorbing games, we show that Markovian Blackwell $ε$-optimal strategies may fail to exist, yet we prove the existence of Blackwell $ε$-optimal strategies that can be implemented by a two-state automaton whose internal transitions are independent of actions. For more general absorbing games, however, there need not exist Blackwell $ε$-optimal strategies that are independent of the adversary's decisions. Our findings point to a contrast between absorbing games and generalized Big Match games, and provide new insights into the properties of optimal policies for robust Markov decision processes.

Playing against a stationary opponent

TL;DR

The paper addresses how simple strategies can approximate the discounted value in zero-sum stochastic games when the opponent is restricted to stationary strategies, with implications for robust MDPs. It proves a mixed set of results: in product absorbing games, Blackwell -optimal strategies can always be realized by a two-state autonomous automaton, even though Markovian strategies may fail to be optimal; in contrast, general absorbing games can admit no blind Blackwell -optimal strategy. The analysis uses concrete game constructions (Big Match and variants) to separate cases where Markovian, blind, and automaton-based strategies differ in power, and it gives explicit formulas for limiting payoffs under stationary responses. The findings highlight a sharp contrast between absorbing games and generalized Big Match games and offer new insights for designing robust MDP policies that are simple yet near-optimal against stationary disturbances.

Abstract

This paper investigates properties of Blackwell -optimal strategies in zero-sum stochastic games when the adversary is restricted to stationary strategies, motivated by applications to robust Markov decision processes. For a class of absorbing games, we show that Markovian Blackwell -optimal strategies may fail to exist, yet we prove the existence of Blackwell -optimal strategies that can be implemented by a two-state automaton whose internal transitions are independent of actions. For more general absorbing games, however, there need not exist Blackwell -optimal strategies that are independent of the adversary's decisions. Our findings point to a contrast between absorbing games and generalized Big Match games, and provide new insights into the properties of optimal policies for robust Markov decision processes.

Paper Structure

This paper contains 19 sections, 10 theorems, 36 equations.

Key Result

Theorem 2.1

The following results hold:

Theorems & Definitions (15)

  • Definition 1
  • Definition 2
  • Theorem 2.1
  • Proposition 2.1
  • Proposition 3.1
  • Lemma 1
  • Corollary 1
  • Lemma 2
  • Lemma 3
  • Proposition 3.2
  • ...and 5 more