Efficiently Solving Turn-Taking Stochastic Games with Extensive-Form Correlation
Hanrui Zhang, Yu Cheng, Vincent Conitzer
TL;DR
The paper addresses efficient computation of extensive-form correlated equilibria in two-player turn-taking stochastic games using graph-form representations. It introduces a novel, polynomial-time SEFCE algorithm based on a maximum-punishment principle and a constrained-planning reduction, together with a recursive Pareto-frontier framework that evaluates pivotal points to bound evaluations. For EFCE, it presents a bi-criteria, $\,\\log(1/\\varepsilon)$-time algorithm that computes an $\,\\varepsilon$-optimal EFCE, leveraging approximate pivotal points and constrained planning. Collectively, the results extend equilibrium computation to succinct graph-form stochastic games with commitment and correlation, advancing beyond tree-form and no-chance-moves restrictions and enabling practical planning with participation-like constraints in dynamic environments.
Abstract
We study equilibrium computation with extensive-form correlation in two-player turn-taking stochastic games. Our main results are two-fold: (1) We give an algorithm for computing a Stackelberg extensive-form correlated equilibrium (SEFCE), which runs in time polynomial in the size of the game, as well as the number of bits required to encode each input number. (2) We give an efficient algorithm for approximately computing an optimal extensive-form correlated equilibrium (EFCE) up to machine precision, i.e., the algorithm achieves approximation error $\varepsilon$ in time polynomial in the size of the game, as well as $\log(1 / \varepsilon)$. Our algorithm for SEFCE is the first polynomial-time algorithm for equilibrium computation with commitment in such a general class of stochastic games. Existing algorithms for SEFCE typically make stronger assumptions such as no chance moves, and are designed for extensive-form games in the less succinct tree form. Our algorithm for approximately optimal EFCE is, to our knowledge, the first algorithm that achieves 3 desiderata simultaneously: approximate optimality, polylogarithmic dependency on the approximation error, and compatibility with stochastic games in the more succinct graph form. Existing algorithms achieve at most 2 of these desiderata, often also relying on additional technical assumptions.
