Synchronization in Learning in Periodic Zero-Sum Games Triggers Divergence from Nash Equilibrium

Yuma Fujimoto; Kaito Ariu; Kenshi Abe

Synchronization in Learning in Periodic Zero-Sum Games Triggers Divergence from Nash Equilibrium

Yuma Fujimoto, Kaito Ariu, Kenshi Abe

TL;DR

This work studies learning in periodic zero-sum games where the Nash equilibrium moves over time. By casting gradient-descent-ascent as a linear dynamical system and analyzing its eigenstructure, the authors show a synchronization phenomenon: when the average learning speed, captured by the eigenvalue $\alpha$, aligns with the game frequency $\omega$ (i.e., $\alpha/\omega=1$), the time-average of players’ strategies diverges away from the time-average Nash equilibrium; otherwise the dynamics form complex cycles but their time-average converges. The results are derived in a 2×2 setting via the eigenvalue invariant game and extended to general 2×2 smooth periodic games, with further experiments confirming robustness to higher action counts, boundary constraints, non-smooth waves, and polymatrix extensions. This reveals a universal resonance-driven mechanism in learning dynamics under time-varying environments, with implications for tracking evolving equilibria and designing algorithms robust to seasonal or cyclical changes.

Abstract

Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varies (called a ``periodic'' game), however, the Nash equilibrium moves generically. How learning dynamics behave in such periodic games is of interest but still unclear. Interestingly, we discover that the behavior is highly dependent on the relationship between the two speeds at which the game changes and at which players learn. We observe that when these two speeds synchronize, the learning dynamics diverge, and their time-average does not converge. Otherwise, the learning dynamics draw complicated cycles, but their time-average converges. Under some assumptions introduced for the dynamical systems analysis, we prove that this behavior occurs. Furthermore, our experiments observe this behavior even if removing these assumptions. This study discovers a novel phenomenon, i.e., synchronization, and gains insight widely applicable to learning in periodic games.

Synchronization in Learning in Periodic Zero-Sum Games Triggers Divergence from Nash Equilibrium

TL;DR

, aligns with the game frequency

(i.e.,

), the time-average of players’ strategies diverges away from the time-average Nash equilibrium; otherwise the dynamics form complex cycles but their time-average converges. The results are derived in a 2×2 setting via the eigenvalue invariant game and extended to general 2×2 smooth periodic games, with further experiments confirming robustness to higher action counts, boundary constraints, non-smooth waves, and polymatrix extensions. This reveals a universal resonance-driven mechanism in learning dynamics under time-varying environments, with implications for tracking evolving equilibria and designing algorithms robust to seasonal or cyclical changes.

Abstract

Paper Structure (42 sections, 4 theorems, 53 equations, 7 figures)

This paper contains 42 sections, 4 theorems, 53 equations, 7 figures.

Introduction
Setting
Normal-Form Games
Periodic Games
Gradient Descent-Ascent
Eigenvalue of Learning
Example: Eigenvalue Invariant Game
Analysis of Eigenvalue Invariant Game
Solution of Learning Dynamics
Time-Average Analysis
Visualization and Interpretation
Theory on Eigenvalue Varying Games
Solution of Learning Dynamics
Time-Average Analysis
Experimental Verification for Theory
...and 27 more sections

Key Result

Theorem 1

In the eigenvalue invariant games, when $\alpha/\omega=1$ holds, $\bar{x}(T)$ and $\bar{y}(T)$ cycle around the Nash equilibrium.

Figures (7)

Figure 1: Learning dynamics in the periodic game of matching pennies with the time-varying Nash equilibrium. (A) The trajectories of the learning dynamics. The panels show the cases of $\alpha/\omega=10$, $2/3$, $1$, and $2/3$, from left to right. In each panel, the x-, y-, z-axes indicate $x(t)$, $y(t)$, and $z(t)$, respectively. The black broken line shows the trajectory of the Nash equilibrium, i.e., $(x^{*}(t),y^{*}(t))$, and the cross markers show the edges of the oscillation of this equilibrium. In the cases of $\alpha/\omega\neq 1$, the rainbow color shows the time of a single cycle. In $\alpha/\omega=1$, the rainbow color shows the passing of time from blue to red. (B) The projection of the trajectories of Panels A to the plane of $x(t)-x^{*}(t)$ and $y(t)-y^{*}(t)$. In each panel, the black cross marker shows the projection of the Nash equilibrium. The color corresponds to that of Panels A. The orange star markers show the time-average value of the plotted trajectory. The value does not correspond to the equilibrium in $\alpha/\omega=1$. Otherwise, it corresponds. To simulate the learning dynamics, we use the Runge-Kutta fourth-order method with a step size of $1/40$. The initial strategies are set to the Nash equilibrium of the time-average game.
Figure 2: Experiments for learning dynamics in the eigenvalue invariant game for $\bar{\boldsymbol{U}}=((1.1,-1),(-1,0.9))$, $\Delta\boldsymbol{U}=((0.2,0),(0,0))$ without the constraint of their eigenvalues fixed. (A) The maximum (circle) and minimum (cross) values of $x(t)$ within $0\le t\le T=3\times 10^{4}$. Here, the blue markers indicate that the value exists in the interior of $[0,1]$, while the red exterior. (B) The time-average $\bar{x}$ in the last time $T$. The orange marker is plotted when the time-average sufficiently converges, i.e., the time-average moves less than $10^{-3}$ in the last $10^{2}$ time. The red marker means that the time-average does not converge and take the value outside of the y-axis. The gray broken line shows the analytical solution. The numerical method and parameter is the same as Fig. \ref{['F01']}.
Figure 3: Learning dynamics in games with more than two actions. The upper panels show an example of $3\times 3$ matrix games, while the lower $6\times 6$. (A) $\bar{U}$, i.e., the time-average payoff matrix is visualized. The row ($a_i$) and column ($b_i$) indicate X's and Y's actions, respectively. The darker red shows that X receives more payoff, while the darker blue shows that Y receives more. Here, note that a non-dominant structure is given in each payoff matrix, where each action wins one of the opponent's actions but loses another. We also remark that all the elements of the payoff matrix are non-zero (colored by light red or blue). We further consider the perturbation in the payoff matrix. Each element are independently perturbed by the four waves of $\cos\omega t$, $\sin\omega t$, $\cos2\omega t$, and $\sin2\omega t$ with the amplitude of a random number $\sim\mathcal{N}(0,0.04^2)$ in the upper panels and $\sim\mathcal{N}(0,0.02^2)$ in the lower. (B) The maximum value of $x_1(t)$ in overall time $0\le t\le T=3\times 10^{4}$. The meanings of the markers and axes are the same as Fig. \ref{['F02']}-A. (C) The average value in the last time $T$. We regard the time-average as convergent when it moves less than $10^{-3}$ in the last $10^{3}$ time. The meanings of the markers and axes are the same as Fig. \ref{['F02']}-B. The numerical method and parameter is the same as Fig. \ref{['F01']}.
Figure 4: Learning dynamics under the boundary constraint of Eqs. \ref{['x_FTRL_EU']} and \ref{['y_FTRL_EU']}. All the parameters of the game and data are the same as Fig. \ref{['F02']}. (A) The trajectory of learning dynamics in $\omega=\alpha$. The meanings of plots and axes are the same as Fig. \ref{['F01']}-A. (B) The maximum value of $x(t)$ in overall time $0\le t\le T=10^{4}$. The meanings of the markers and axes are the same as Fig. \ref{['F02']}-A. (C) The average value in the last time $T$. We regard the time-average as convergent when it moves less than $10^{-3}$ in the last $10^{2}$ time.The meanings of the markers and axes are the same as Fig. \ref{['F02']}-B. The numerical method is the same as Fig. \ref{['F01']} but the step size is $1/(4\times 10^{3})$.
Figure 5: Schematics of the division of integral ranges. The x- and y-axes indicate the direction of $\tau$ and $t$, respectively. All the colored area (i.e., $0\le\tau\le t\le T$) shows the range of integral. Each square is of length $2\pi/\omega$. (1) The red area shows the divergence term, which is $O(T)$ in the time-average and thus diverges. (2) The green area shows the oscillation term, which is $O(1)$ in the time-average and oscillates over time. (3) The blue area shows the convergence term, which is $O(1)$ in the time-average. (4) Last, the orange area shows the negligible term, which is $O(1/T)$ in the time-average and thus disappears over time.
...and 2 more figures

Theorems & Definitions (11)

Definition 1: Periodic game
Example 1: Eigenvalue invariant game
Theorem 1: Time-average cycling in $\alpha/\omega=1$
Theorem 2: Time-average convergence in $\alpha/\omega\neq 1$
Definition 2: $2\times 2$ smooth periodic games
Theorem 3: Time-average divergence in $\alpha/\omega\in\mathbb{N}$
Theorem 4: Time-average convergence in $\alpha/\omega\notin\mathbb{N}$
proof
proof
proof
...and 1 more

Synchronization in Learning in Periodic Zero-Sum Games Triggers Divergence from Nash Equilibrium

TL;DR

Abstract

Synchronization in Learning in Periodic Zero-Sum Games Triggers Divergence from Nash Equilibrium

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (11)