Table of Contents
Fetching ...

Optimism Without Regularization: Constant Regret in Zero-Sum Games

John Lazarsfeld, Georgios Piliouras, Ryann Sim, Stratis Skoulakis

TL;DR

This work shows that constant regret $O(1)$ is achievable by unregularized Optimistic Fictitious Play (OFP) in two-player zero-sum games with a unique interior Nash equilibrium, answering a long-standing question about the necessity of regularization. The authors introduce a geometric dual-space energy framework, prove a uniform absolute bound on the dual energy, and thereby establish the $O(1)$ regret for OFP in the 2x2 setting. They further prove a lower bound $\Omega(\sqrt{T})$ for Alternating Fictitious Play, demonstrating a separation between optimism and alternation without regularization. Complemented by experiments that suggest similar behavior in larger games, the results imply that optimism can drive fast learning in zero-sum games even without finite-step-size constraints, with potential implications for equilibrium computation and self-play in multi-agent settings.

Abstract

This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL -- a regularized algorithm with a bounded stepsize parameter -- obtains constant regret in this setting, we show for the first time that similar, optimal rates are also achievable without regularization: we prove for two-strategy games that Optimistic Fictitious Play (using any tiebreaking rule) obtains only constant regret, providing surprising new evidence on the ability of non-no-regret algorithms for fast learning in games. Our proof technique leverages a geometric view of Optimistic Fictitious Play in the dual space of payoff vectors, where we show a certain energy function of the iterates remains bounded over time. Additionally, we also prove a regret lower bound of $Ω(\sqrt{T})$ for Alternating Fictitious Play. In the unregularized regime, this separates the ability of optimism and alternation in achieving $o(\sqrt{T})$ regret.

Optimism Without Regularization: Constant Regret in Zero-Sum Games

TL;DR

This work shows that constant regret is achievable by unregularized Optimistic Fictitious Play (OFP) in two-player zero-sum games with a unique interior Nash equilibrium, answering a long-standing question about the necessity of regularization. The authors introduce a geometric dual-space energy framework, prove a uniform absolute bound on the dual energy, and thereby establish the regret for OFP in the 2x2 setting. They further prove a lower bound for Alternating Fictitious Play, demonstrating a separation between optimism and alternation without regularization. Complemented by experiments that suggest similar behavior in larger games, the results imply that optimism can drive fast learning in zero-sum games even without finite-step-size constraints, with potential implications for equilibrium computation and self-play in multi-agent settings.

Abstract

This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL -- a regularized algorithm with a bounded stepsize parameter -- obtains constant regret in this setting, we show for the first time that similar, optimal rates are also achievable without regularization: we prove for two-strategy games that Optimistic Fictitious Play (using any tiebreaking rule) obtains only constant regret, providing surprising new evidence on the ability of non-no-regret algorithms for fast learning in games. Our proof technique leverages a geometric view of Optimistic Fictitious Play in the dual space of payoff vectors, where we show a certain energy function of the iterates remains bounded over time. Additionally, we also prove a regret lower bound of for Alternating Fictitious Play. In the unregularized regime, this separates the ability of optimism and alternation in achieving regret.

Paper Structure

This paper contains 62 sections, 32 theorems, 71 equations, 6 figures, 3 tables.

Key Result

Proposition 2.0

Let $\widetilde{x}^T_1 = \frac{1}{T}(\sum_{t=0}^T x^t_1)$ and $\widetilde{x}^T_2 = \frac{1}{T}(\sum_{t=0}^T x^t_2)$ denote the time-average iterates of Players 1 and 2, respectively, and suppose $\textnormal{Reg}(T) = o(T)$. Then $(\widetilde{x}^T_1, \widetilde{x}^T_2)$ converges (in duality-gap) to

Figures (6)

  • Figure 1: Empirical regret of standard (FP), Optimistic (OFP), and Alternating (AFP) Fictitious Play in Matching Pennies (from $x^0_1 = (1/3, 2/3)$, $x^0_2 = (2/3, 1/3)$), on the 15$\times$15 identity matrix (from $x^0_1 = e_1$, $x^0_2 = e_n$), and on 15$\times$15 generalized Rock-Paper-Scissors (from $x^0_1 = e_1$, $x^0_2 = e_n$). Each algorithm was run for $T=10000$ iterations using a lexicographical tiebreaking rule. Each subfigure demonstrates the constant empirical regret of OFP compared to the roughly $\sqrt{T}$ regret growth of standard FP and AFP. More experimental details and results are given in Section \ref{['sec:conclusion']} and Section \ref{['app:experiments']}.
  • Figure 2: Examples of the sets in $\mathcal{P}$, $\widetilde{\mathcal{P}}$, and $\widehat{\mathcal{P}}$.
  • Figure 3: Visual intuition for the claims and proof of Lemma \ref{['lem:invariantsenergy']}.
  • Figure 4: Empirical regret of standard FP and Optimistic FP (OFP) using randomized tiebreaking on three $15\times15$ payoff matrices. For each payoff matrix, each algorithm was initialized from $x^0_1 = e_1, x^0_2 = e_n$ and run for $T=10000$ iterations.
  • Figure 5: Empirical regret of standard FP and Optimistic FP (OFP) using lexicographical tiebreaking on three $25\times25$ payoff matrices. For each payoff matrix, each algorithm was initialized from $x^0_1 = e_1, x^0_2 = e_n$ and run for $T=10000$ iterations.
  • ...and 1 more figures

Theorems & Definitions (58)

  • Proposition 2.0
  • Remark 2.0: Tiebreaking Rules
  • Definition 2.0
  • Theorem 3.1
  • Theorem 3.2
  • Definition 3.2
  • Proposition 3.2
  • Theorem 3.3
  • Proposition 3.3
  • Proposition 4.0: bailey2019fast
  • ...and 48 more