Table of Contents
Fetching ...

Asymptotic Extinction in Large Coordination Games

Desmond Chan, Bart De Keijzer, Tobias Galla, Stefanos Leonardos, Carmine Ventre

TL;DR

This work analyzes learning dynamics in large multi-player coordination games under Q-Learning, focusing on the emergence of Quantal Response Equilibria (QRE) and how exploration controls convergence. By employing a generating-functional (DMFT-like) framework, it derives an effective stochastic dynamics for the distribution of actions in the $N\to\infty$ limit and reveals a phase boundary at a critical exploration rate $T_{\text{crit}}$ that grows with both the number of players $p$ and payoff alignment $\Gamma$. A key finding is asymptotic extinction: for $\Gamma\ge 0$, a nonzero fraction of actions are played with vanishing probability as $N\to\infty$, while above $T_{\text{crit}}$ the dynamics converge to a unique fixed point; below this threshold, extinction is more pronounced and the fixed-point structure exhibits boundary behavior. The stability analysis yields a precise criterion, and in the large-$p$ limit the critical rate scales as $T_{\text{crit}} \approx (\hat{\Gamma}+1)\sqrt{e(p-1)}$ with $\hat{\Gamma}=\Gamma/(p-1)$, implying that identical-payoff coordination requires roughly twice the exploration of $p$-player zero-sum games. Collectively, the results illuminate how large action spaces and payoff alignment shape learning convergence and action-selection structure, informing exploration strategies in MARL for coordination tasks.

Abstract

We study the exploration-exploitation trade-off for large multiplayer coordination games where players strategise via Q-Learning, a common learning framework in multi-agent reinforcement learning. Q-Learning is known to have two shortcomings, namely non-convergence and potential equilibrium selection problems, when there are multiple fixed points, called Quantal Response Equilibria (QRE). Furthermore, whilst QRE have full support for finite games, it is not clear how Q-Learning behaves as the game becomes large. In this paper, we characterise the critical exploration rate that guarantees convergence to a unique fixed point, addressing the two shortcomings above. Using a generating-functional method, we show that this rate increases with the number of players and the alignment of their payoffs. For many-player coordination games with perfectly aligned payoffs, this exploration rate is roughly twice that of $p$-player zero-sum games. As for large games, we provide a structural result for QRE, which suggests that as the game size increases, Q-Learning converges to a QRE near the boundary of the simplex of the action space, a phenomenon we term asymptotic extinction, where a constant fraction of the actions are played with zero probability at a rate $o(1/N)$ for an $N$-action game.

Asymptotic Extinction in Large Coordination Games

TL;DR

This work analyzes learning dynamics in large multi-player coordination games under Q-Learning, focusing on the emergence of Quantal Response Equilibria (QRE) and how exploration controls convergence. By employing a generating-functional (DMFT-like) framework, it derives an effective stochastic dynamics for the distribution of actions in the limit and reveals a phase boundary at a critical exploration rate that grows with both the number of players and payoff alignment . A key finding is asymptotic extinction: for , a nonzero fraction of actions are played with vanishing probability as , while above the dynamics converge to a unique fixed point; below this threshold, extinction is more pronounced and the fixed-point structure exhibits boundary behavior. The stability analysis yields a precise criterion, and in the large- limit the critical rate scales as with , implying that identical-payoff coordination requires roughly twice the exploration of -player zero-sum games. Collectively, the results illuminate how large action spaces and payoff alignment shape learning convergence and action-selection structure, informing exploration strategies in MARL for coordination tasks.

Abstract

We study the exploration-exploitation trade-off for large multiplayer coordination games where players strategise via Q-Learning, a common learning framework in multi-agent reinforcement learning. Q-Learning is known to have two shortcomings, namely non-convergence and potential equilibrium selection problems, when there are multiple fixed points, called Quantal Response Equilibria (QRE). Furthermore, whilst QRE have full support for finite games, it is not clear how Q-Learning behaves as the game becomes large. In this paper, we characterise the critical exploration rate that guarantees convergence to a unique fixed point, addressing the two shortcomings above. Using a generating-functional method, we show that this rate increases with the number of players and the alignment of their payoffs. For many-player coordination games with perfectly aligned payoffs, this exploration rate is roughly twice that of -player zero-sum games. As for large games, we provide a structural result for QRE, which suggests that as the game size increases, Q-Learning converges to a QRE near the boundary of the simplex of the action space, a phenomenon we term asymptotic extinction, where a constant fraction of the actions are played with zero probability at a rate for an -action game.

Paper Structure

This paper contains 19 sections, 1 theorem, 89 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Q-Learning converges to a unique fixed point when the parameters and corresponding fixed point fulfils the following relation: where $\phi$ is the proportion of non-extinct strategies, given by $P(z < z_{\text{crit}})$ where $z \sim \mathcal{N}(0,1)$.

Figures (12)

  • Figure 1: Empirical cumulative density plot representing the marginal likelihood of playing an action at unique fixed point for randomly generated games following the Q-Learning dynamic where $\Gamma= 0, T= 1.8, p=2$. The plot is zoomed in at the bottom $1 \%$ of least played actions and $x$ is rescaled such that $x=1$ would represent the average likelihood $(1/N)$. As $N$ increases, a probability mass appears to form near 0, representing actions going asymptotically extinct. The red line represents the theoretical estimate of the extinction rate $(0.74\%)$ in the $N \to \infty$ limit. In this limit, the cumulative density plot would begin on the red line.
  • Figure 2: Sketch of $x(z)$ for different values of $\Gamma$. The solution for $\Gamma > 0$ is double-valued below a critical $z$, as seen by the dotted lines. The bottom (solid) branch is of interest here.
  • Figure 3: Theoretical asymptotic extinction rate for varying numbers of players $p$ obtained from estimates from the fixed point relations \ref{['fixed_gamma_pos']}. These estimations are only for the unique fixed point regime (right of the yellow dotted line representing the stability boundary, which is solved in the next segment) . There are some numerical instability in the estimations (namely when $\Gamma < 0.1$, thus the axes not starting at 0), but the figure roughly shows the scale of the expected extinction rate for varying $T$ away from the boundary.
  • Figure 4: Stability boundary obtained by solving \ref{['contra2']} for varying values of $p$, as a function of $T$ for $\hat{\Gamma} > 0$, where $\hat{\Gamma} = \Gamma / (p-1)$. To the right of the boundary, all Q-Learning trajectories converge to a unique fixed point in the large action size limit, $N\to \infty$. When $\hat{\Gamma} < 0$, we recover the results from sanders2018prevalence. Our work extends the stability boundary to cover $\hat{\Gamma} > 0$.
  • Figure 5: Rescaled stability curves for selected values of $p$. The exploration rate, $T$, is rescaled by a factor of $\sqrt{e(p-1)}$ and the grey-dashed line represents the straight line given by $T_{\text{crit}} = (\hat{\Gamma} + 1) \sqrt{e(p-1)}$, which appears to be the limiting behaviour at $p \to \infty$. The increasing agreement with the grey line for large curves with larger values of $p$ suggests this linear relationship is valid, in the large $p$ limit.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Definition 1: Quantal Response Equilibrium (QRE)
  • Proposition 1