Perturbing Best Responses in Zero-Sum Games
Adam Dziwoki, Rostislav Horcik
TL;DR
This work investigates how perturbing the best-response computations affects two cornerstone BRO-based algorithms for approximating Nash equilibria in two-player zero-sum games. It introduces perturbed BRO variants of Fictitious Play (SFP) and Double Oracle (SDO), analyzes their convergence under Gumbel and uniform perturbations, and shows that SFP attains $\mathcal{O}\left(\frac{\log n}{\varepsilon^2}\right)$ expected iterations to reach an $\varepsilon$-NE, while SDO achieves $\mathcal{O}(\log n)$ in expectation on certain challenging examples. The paper also develops efficient perturbation schemes for structured games (POSGs, EFGs) that perturb only terminal rewards or transitions, using clustering to reduce overhead; these methods speed convergence in several stochastic-game and grid-path experiments. Overall, perturbations extend the practical reach of BRO-based NE algorithms, enabling faster convergence in large or structured zero-sum games, with open questions regarding the full generality of DO under perturbations. $
Abstract
This paper investigates the impact of perturbations on the best-response-based algorithms approximating Nash equilibria in zero-sum games, namely Double Oracle and Fictitious Play. More precisely, we assume that the oracle computing the best responses perturbs the utilities before selecting the best response. We show that using such an oracle reduces the number of iterations for both algorithms. For some cases, suitable perturbations ensure the expected number of iterations is logarithmic. Although the utility perturbation is computationally demanding as it requires iterating through all pure strategies, we demonstrate that one can efficiently perturb the utilities in games where pure strategies have further inner structure.
