Table of Contents
Fetching ...

No-regret learning in harmonic games: Extrapolation in the face of conflicting interests

Davide Legacci, Panayotis Mertikopoulos, Christos H. Papadimitriou, Georgios Piliouras, Bary S. R. Pradelski

TL;DR

This work analyzes no-regret learning in harmonic games, the strategic counterpart to potential games with conflicting interests. It shows that continuous-time FTRL dynamics are Poincaré recurrent in harmonic games, precluding convergence, while vanilla discrete-time FTRL can diverge. By introducing an extrapolated variant, FTRL+, the authors prove order-optimal $O(1)$ regret and convergence of empirical play to CCE at rate $O(1/T)$, and, under smooth regularizers and suitable learning-rate bounds, convergence of strategy profiles to Nash equilibria. The results extend known two-player zero-sum dynamics to general $N$-player harmonic games, establishing harmonic games as the dynamic complement of potential games and opening avenues for adaptive and bandit extensions.

Abstract

The long-run behavior of multi-agent learning - and, in particular, no-regret learning - is relatively well-understood in potential games, where players have aligned interests. By contrast, in harmonic games - the strategic counterpart of potential games, where players have conflicting interests - very little is known outside the narrow subclass of 2-player zero-sum games with a fully-mixed equilibrium. Our paper seeks to partially fill this gap by focusing on the full class of (generalized) harmonic games and examining the convergence properties of follow-the-regularized-leader (FTRL), the most widely studied class of no-regret learning schemes. As a first result, we show that the continuous-time dynamics of FTRL are Poincaré recurrent, that is, they return arbitrarily close to their starting point infinitely often, and hence fail to converge. In discrete time, the standard, "vanilla" implementation of FTRL may lead to even worse outcomes, eventually trapping the players in a perpetual cycle of best-responses. However, if FTRL is augmented with a suitable extrapolation step - which includes as special cases the optimistic and mirror-prox variants of FTRL - we show that learning converges to a Nash equilibrium from any initial condition, and all players are guaranteed at most O(1) regret. These results provide an in-depth understanding of no-regret learning in harmonic games, nesting prior work on 2-player zero-sum games, and showing at a high level that harmonic games are the canonical complement of potential games, not only from a strategic, but also from a dynamic viewpoint.

No-regret learning in harmonic games: Extrapolation in the face of conflicting interests

TL;DR

This work analyzes no-regret learning in harmonic games, the strategic counterpart to potential games with conflicting interests. It shows that continuous-time FTRL dynamics are Poincaré recurrent in harmonic games, precluding convergence, while vanilla discrete-time FTRL can diverge. By introducing an extrapolated variant, FTRL+, the authors prove order-optimal regret and convergence of empirical play to CCE at rate , and, under smooth regularizers and suitable learning-rate bounds, convergence of strategy profiles to Nash equilibria. The results extend known two-player zero-sum dynamics to general -player harmonic games, establishing harmonic games as the dynamic complement of potential games and opening avenues for adaptive and bandit extensions.

Abstract

The long-run behavior of multi-agent learning - and, in particular, no-regret learning - is relatively well-understood in potential games, where players have aligned interests. By contrast, in harmonic games - the strategic counterpart of potential games, where players have conflicting interests - very little is known outside the narrow subclass of 2-player zero-sum games with a fully-mixed equilibrium. Our paper seeks to partially fill this gap by focusing on the full class of (generalized) harmonic games and examining the convergence properties of follow-the-regularized-leader (FTRL), the most widely studied class of no-regret learning schemes. As a first result, we show that the continuous-time dynamics of FTRL are Poincaré recurrent, that is, they return arbitrarily close to their starting point infinitely often, and hence fail to converge. In discrete time, the standard, "vanilla" implementation of FTRL may lead to even worse outcomes, eventually trapping the players in a perpetual cycle of best-responses. However, if FTRL is augmented with a suitable extrapolation step - which includes as special cases the optimistic and mirror-prox variants of FTRL - we show that learning converges to a Nash equilibrium from any initial condition, and all players are guaranteed at most O(1) regret. These results provide an in-depth understanding of no-regret learning in harmonic games, nesting prior work on 2-player zero-sum games, and showing at a high level that harmonic games are the canonical complement of potential games, not only from a strategic, but also from a dynamic viewpoint.
Paper Structure (33 sections, 31 theorems, 104 equations, 3 figures)

This paper contains 33 sections, 31 theorems, 104 equations, 3 figures.

Key Result

Theorem 1

Under eq:FTRL-cont, each player's regret is bounded as $\mathop{\mathrm{Reg}}\nolimits_{i}(T) \leq H_{i} \coloneqq \max h_{i} - \min h_{i}$.

Figures (3)

  • Figure 1: The evolution of vanilla vs. extrapolated FTRL schemes in harmonic games. In the left figure, we consider the game of Matching Pennies (blue: FTRL+; green: FTRL; red: continuous time FTRL); in the center and to the right, two different orbits in a $2\times2\times2$ harmonic game from two different viewpoints (blue: FTRL+; green/orange:FTRL; payoff profiles on vertices). In all cases, we ran the optimistic variant of FTRL+ ($\lambda_{i}=0$ for all players), and we see that the trajectories of \ref{['eq:FTRL']} diverge away from equilibrium and the trajectories of \ref{['eq:FTRL-cont']} are recurrent (actually, periodic), whereas \ref{['eq:FTRL+']} converges. We also see the highly non-convex structure of harmonic games as evidence by their equilibrium set (thick red line in center and right subfigures).
  • Figure 2: Representation of the harmonic payoff structure for the game in \ref{['ex:harmonic-game']}. Each payoff vector $v(x)$ (black arrows) is perpendicular (with respect to a weighted inner product) to the vector $x - q$ (dotted segment) between the evaluation point $x$ of the payoff field and the fully mixed Nash equilibrium $q$ (red point). As a consequence every orbit of FTRL in continuous time (such as the one represented by the black curve) is Poincaré recurrent (in this low-dimensional example, even periodic), as detailed in \ref{['thm:recurrence']} in the main text. Color shading and dotted lines represents player $1$'s utility level sets, with brighter regions indicating higher payoffs.
  • Figure 3: Commutative diagram of the maps discussed in \ref{['sec:continuous-ftrl-app', 'sec:constant-fenchel', 'sec:ftrl-z-space']}; note in particular that $v \circ Q$ is a vector field on $\mathcal{Y}$. The notation $\mathcal{X} \hookrightarrow \mathcal{V}$ is equivalent to $\mathcal{X} \subseteq\mathcal{V}$.

Theorems & Definitions (65)

  • Remark
  • Definition 1
  • Remark
  • Theorem 1
  • Theorem 2
  • Remark
  • Theorem 3
  • Theorem 4
  • Definition A.1
  • Lemma A.2
  • ...and 55 more