Nash Equilibrium and Learning Dynamics in Three-Player Matching $m$-Action Games

Yuma Fujimoto; Kaito Ariu; Kenshi Abe

Nash Equilibrium and Learning Dynamics in Three-Player Matching $m$-Action Games

Yuma Fujimoto, Kaito Ariu, Kenshi Abe

TL;DR

This work extends the classic two-player Matching Pennies to a three-player, multi-action framework ($m$-3MA) and provides a complete Nash equilibrium analysis, showing symmetry $x^*=y^*=z^*$ and a spectrum of equilibria (uniform, pure, double-roots). It then studies learning dynamics under continuous-time Follow the Regularized Leader (FTRL) with entropic and Euclidean regularizers, introducing the Lyapunov-like measure $V$ to capture synchronization among players. The dynamics, governed by three-parameter interactions $(oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{}}}}}}}$), exhibit cycling, convergence to two-action equilibria, or heteroclinic cycles depending on the signs of $oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{}}}}}}}$ and $oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{oldsymbol{}}}}}}}$, with $eta$ shaping rotational aspects. The results illuminate how triadic interactions shape global learning behavior and point to future work on memory and last-iterate convergence in multi-agent games.

Abstract

Learning in games discusses the processes where multiple players learn their optimal strategies through the repetition of game plays. The dynamics of learning between two players in zero-sum games, such as Matching Pennies, where their benefits are competitive, have already been well analyzed. However, it is still unexplored and challenging to analyze the dynamics of learning among three players. In this study, we formulate a minimalistic game where three players compete to match their actions with one another. Although interaction among three players diversifies and complicates the Nash equilibria, we fully analyze the equilibria. We also discuss the dynamics of learning based on some famous algorithms categorized into Follow the Regularized Leader. From both theoretical and experimental aspects, we characterize the dynamics by categorizing three-player interactions into three forces to synchronize their actions, switch their actions rotationally, and seek competition.

Nash Equilibrium and Learning Dynamics in Three-Player Matching $m$-Action Games

TL;DR

This work extends the classic two-player Matching Pennies to a three-player, multi-action framework (

-3MA) and provides a complete Nash equilibrium analysis, showing symmetry

and a spectrum of equilibria (uniform, pure, double-roots). It then studies learning dynamics under continuous-time Follow the Regularized Leader (FTRL) with entropic and Euclidean regularizers, introducing the Lyapunov-like measure

to capture synchronization among players. The dynamics, governed by three-parameter interactions

), exhibit cycling, convergence to two-action equilibria, or heteroclinic cycles depending on the signs of

and

, with

shaping rotational aspects. The results illuminate how triadic interactions shape global learning behavior and point to future work on memory and last-iterate convergence in multi-agent games.

Abstract

Paper Structure (33 sections, 9 theorems, 13 equations, 4 figures)

This paper contains 33 sections, 9 theorems, 13 equations, 4 figures.

Introduction
Preliminary
Three-Player Matching $m$-Action Games
Real-world application
Nash Equilibrium
Full Analysis
Visualization and Interpretation
Learning algorithm
Learning Dynamics
Characterization of $m$-3MA
Analysis and Experiments of Two-Action Games
Theoretical Analysis
Experimental Understanding
Cycling behavior ($\alpha=0$):
Convergence to the pure-strategy equilibria ($\alpha>0$):
...and 18 more sections

Key Result

Theorem 1

For any Nash equilibrium $(\boldsymbol{x}^*,\boldsymbol{y}^*,\boldsymbol{z}^*)$, $\boldsymbol{x}^*=\boldsymbol{y}^*=\boldsymbol{z}^*$ holds, and the set of one's strategies is given by

Figures (4)

Figure 1: A. Three players, X, Y, and Z, independently choose their actions. Players who choose the same action play the game together. B. In the game, players have a three-way deadlock relationship. X, Y, and Z are advantageous to Y, Z, and X, respectively. C. The three players receive their own scores as a result of their action choices. When two of the three players (X and Y in the left panel) choose the same action, the winner's score is $a$, while the loser's score is $b$, following the three-way deadlock relationship. An isolated player (Z in the center panel) who chooses a different action from others receives a score of $c$. If all three players choose the same action (in the right panel), they receive scores of $\epsilon$. Here, we assume $b<c<a$ and $b<\epsilon<a$.
Figure 2: The Nash equilibrium in $m$-3MA with $m=3$. This Nash equilibrium crucially changes depending on $\alpha$ and $\gamma$. Each panel shows the simplex $\Delta^{2}$ of one's strategy. The edges of triangles indicate the pure strategies, i.e., $\boldsymbol{x}=\boldsymbol{y}=\boldsymbol{z}=\boldsymbol{e}_i$, where all the players choose only action $i$. The blue stars represent $\mathcal{N}_{\rm U}(3)$ (and $\mathrm{Proj}^{-1}(\mathcal{N}_{\rm U}(2))$). The green stars represent $\mathcal{N}_{\rm P}(3)$. Finally, the red stars represent $\mathcal{N}_{\rm DR}(3)$. Here, note that since we consider the case of $m=3$, $\mathrm{Proj}^{-1}(\mathcal{N}_{\rm DR}(2))=\emptyset$ always hold. Here, the positions of $\mathcal{N}_{\rm DR}$ change depending on $x_{{\rm ext}}$, which is determined by $\alpha$ and $\gamma$.
Figure 3: A. The dynamics of FTRL with the entropic regularizer in $m$-3MA with $m=2$. The dynamics are output by the fourth-order Runge-Kutta method with the step-size of $2\times 10^{-2}$ in all the panels. We also commonly set $(a,b,c)=(1,-1,0)$, in other words, $\beta=2$. In the left, center, and right panels, we set $\epsilon(=\alpha)=0.1$, $0$, $-0.1$, respectively. The red, green, and blue lines indicate the time series of $x_i$, $y_i$, and $z_i$, respectively, while the solid, broken, and dotted lines indicate $i=1$, $2$, and $3$, respectively. The solid black line indicates the time series of $V$. The initial strategies in each panel are randomly sampled from the strategy simplexes. In the left and right panels, the Nash equilibria are plotted by the black star. In the center panel, all the points on the black solid line are the Nash equilibria. The broken line in the right panel is the set of states satisfying the minimum $V$ condition. B. The dynamics of FTRL with the Euclidean regularizer. The method and simulation parameters are the same as panel A.
Figure 4: The dynamics of FTRL with the entropic regularizer. The dynamics are output by the fourth-order Runge-Kutta method with the step-size of $2\times 10^{-2}$ in all the panels. For $\alpha>0$, $=0$, and $<0$, we set $(\epsilon,c)=(0.1,0)$, $(0,0)$, and $(-0.1,0)$, respectively. For $\gamma>0$, $=0$, and $<0$, we set $(a,b,c)=(1.1,-0.9,0)$, $(1,-1,0)$, and $(0.9,-1.1,0)$, respectively.

Theorems & Definitions (9)

Theorem 1: Nash equilibrium solution
Corollary 1: Main properties of the Nash equilibria
Lemma 1: Replicator dynamics and gradient ascent
Lemma 2: Properties of $V$
Lemma 3: Simplified dynamics for $m=2$
Theorem 2: Monotonicity of $V$
Theorem 3: Connection between $G$ and $V$
Corollary 2: Global behavior of dynamics
Theorem 4: Monotonicity of $V$ for $m$-3MA

Nash Equilibrium and Learning Dynamics in Three-Player Matching $m$-Action Games

TL;DR

Abstract

Nash Equilibrium and Learning Dynamics in Three-Player Matching $m$-Action Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (9)