Optimism Without Regularization: Constant Regret in Zero-Sum Games
John Lazarsfeld, Georgios Piliouras, Ryann Sim, Stratis Skoulakis
TL;DR
This work shows that constant regret $O(1)$ is achievable by unregularized Optimistic Fictitious Play (OFP) in two-player zero-sum games with a unique interior Nash equilibrium, answering a long-standing question about the necessity of regularization. The authors introduce a geometric dual-space energy framework, prove a uniform absolute bound on the dual energy, and thereby establish the $O(1)$ regret for OFP in the 2x2 setting. They further prove a lower bound $\Omega(\sqrt{T})$ for Alternating Fictitious Play, demonstrating a separation between optimism and alternation without regularization. Complemented by experiments that suggest similar behavior in larger games, the results imply that optimism can drive fast learning in zero-sum games even without finite-step-size constraints, with potential implications for equilibrium computation and self-play in multi-agent settings.
Abstract
This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL -- a regularized algorithm with a bounded stepsize parameter -- obtains constant regret in this setting, we show for the first time that similar, optimal rates are also achievable without regularization: we prove for two-strategy games that Optimistic Fictitious Play (using any tiebreaking rule) obtains only constant regret, providing surprising new evidence on the ability of non-no-regret algorithms for fast learning in games. Our proof technique leverages a geometric view of Optimistic Fictitious Play in the dual space of payoff vectors, where we show a certain energy function of the iterates remains bounded over time. Additionally, we also prove a regret lower bound of $Ω(\sqrt{T})$ for Alternating Fictitious Play. In the unregularized regime, this separates the ability of optimism and alternation in achieving $o(\sqrt{T})$ regret.
