Playing against a stationary opponent
Julien Grand-Clément, Nicolas Vieille
TL;DR
The paper addresses how simple strategies can approximate the discounted value in zero-sum stochastic games when the opponent is restricted to stationary strategies, with implications for robust MDPs. It proves a mixed set of results: in product absorbing games, Blackwell $\varepsilon$-optimal strategies can always be realized by a two-state autonomous automaton, even though Markovian strategies may fail to be optimal; in contrast, general absorbing games can admit no blind Blackwell $\varepsilon$-optimal strategy. The analysis uses concrete game constructions (Big Match and variants) to separate cases where Markovian, blind, and automaton-based strategies differ in power, and it gives explicit formulas for limiting payoffs under stationary responses. The findings highlight a sharp contrast between absorbing games and generalized Big Match games and offer new insights for designing robust MDP policies that are simple yet near-optimal against stationary disturbances.
Abstract
This paper investigates properties of Blackwell $ε$-optimal strategies in zero-sum stochastic games when the adversary is restricted to stationary strategies, motivated by applications to robust Markov decision processes. For a class of absorbing games, we show that Markovian Blackwell $ε$-optimal strategies may fail to exist, yet we prove the existence of Blackwell $ε$-optimal strategies that can be implemented by a two-state automaton whose internal transitions are independent of actions. For more general absorbing games, however, there need not exist Blackwell $ε$-optimal strategies that are independent of the adversary's decisions. Our findings point to a contrast between absorbing games and generalized Big Match games, and provide new insights into the properties of optimal policies for robust Markov decision processes.
