Table of Contents
Fetching ...

Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

Aamal Hussain, Dan Leonte, Francesco Belardinelli, Raphael Huser, Dario Paccagnan

TL;DR

This work analyzes the convergence of Q-Learning Dynamics in network polymatrix games where agents interact over random graphs (ER and SBM). It derives explicit, sparsity-driven conditions on exploration rates that guarantee a unique Quantal Response Equilibrium (QRE) with high probability as the agent count grows, linking convergence to the spectral radius of the network adjacency. Theoretical bounds show that sparser networks enable reliable convergence at lower exploration, while densely connected graphs may hinder it, a finding corroborated by extensive simulations. By framing the problem through monotone variational inequality concepts, the paper provides practical guidance for achieving stable learning in large-scale, network-based multi-agent systems and outlines avenues for extending the results to broader game classes and dynamical settings.

Abstract

Beyond specific settings, many multi-agent learning algorithms fail to converge to an equilibrium solution, and instead display complex, non-stationary behaviours such as recurrent or chaotic orbits. In fact, recent literature suggests that such complex behaviours are likely to occur when the number of agents increases. In this paper, we study Q-learning dynamics in network polymatrix games where the network structure is drawn from classical random graph models. In particular, we focus on the Erdos-Renyi model, a well-studied model for social networks, and the Stochastic Block model, which generalizes the above by accounting for community structures within the network. In each setting, we establish sufficient conditions under which the agents' joint strategies converge to a unique equilibrium. We investigate how this condition depends on the exploration rates, payoff matrices and, crucially, the sparsity of the network. Finally, we validate our theoretical findings through numerical simulations and demonstrate that convergence can be reliably achieved in many-agent systems, provided network sparsity is controlled.

Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

TL;DR

This work analyzes the convergence of Q-Learning Dynamics in network polymatrix games where agents interact over random graphs (ER and SBM). It derives explicit, sparsity-driven conditions on exploration rates that guarantee a unique Quantal Response Equilibrium (QRE) with high probability as the agent count grows, linking convergence to the spectral radius of the network adjacency. Theoretical bounds show that sparser networks enable reliable convergence at lower exploration, while densely connected graphs may hinder it, a finding corroborated by extensive simulations. By framing the problem through monotone variational inequality concepts, the paper provides practical guidance for achieving stable learning in large-scale, network-based multi-agent systems and outlines avenues for extending the results to broader game classes and dynamical settings.

Abstract

Beyond specific settings, many multi-agent learning algorithms fail to converge to an equilibrium solution, and instead display complex, non-stationary behaviours such as recurrent or chaotic orbits. In fact, recent literature suggests that such complex behaviours are likely to occur when the number of agents increases. In this paper, we study Q-learning dynamics in network polymatrix games where the network structure is drawn from classical random graph models. In particular, we focus on the Erdos-Renyi model, a well-studied model for social networks, and the Stochastic Block model, which generalizes the above by accounting for community structures within the network. In each setting, we establish sufficient conditions under which the agents' joint strategies converge to a unique equilibrium. We investigate how this condition depends on the exploration rates, payoff matrices and, crucially, the sparsity of the network. Finally, we validate our theoretical findings through numerical simulations and demonstrate that convergence can be reliably achieved in many-agent systems, provided network sparsity is controlled.

Paper Structure

This paper contains 22 sections, 16 theorems, 45 equations, 4 figures.

Key Result

Lemma 1

Let $\clG = (\clN, \clE, (\clA)_k, (A, B)_{(k, l) \in \clE})$ be a network polymatrix game that satisfies Assumption ass::bimatrix-network. Also let $G$ be the adjacency matrix associated with the edge-set $\clE$. If, for each agent $k$, the exploration rate $T_k$ satisfies the QRE $\bfx^*$ of the game $\clG$ is unique. In addition, for all initial conditions, trajectories of (eqn::QLD) converge

Figures (4)

  • Figure 1: Proportion of converged runs after simulating (\ref{['eqn::QLD']}) for choices of $(p, T)$ in network games where the network is drawn from the Erdős-Rényi model. We depict the (top) Network Sato Game (middle) Network Shapley Game with $\beta = 0.2$ (bottom) Conflict Network Game, each with varying $N$ across heatmaps. For the Network Sato Game ($\delta_I = 0.2$), we also depict the lower bound predicted by Theorem \ref{['thm::er-convergence']} whilst for the Network Shapley Game $\delta_I = 2.0$, the bound exceeds the range of the plot and, due to the random nature of the payoffs, it is not possible to depict the bound for the Conflict Network Game.
  • Figure 2: Variation of the empirical convergence boundary in the (Left) Network Sato Game (Right) Network Shapley Game as $N$ increases, for different values of $p$. The $y$-axis shows the smallest exploration rate for which all simulations converged. In the Network Sato Game, we also plot the theoretical bound as a dashed line.
  • Figure 3: Proportion of converged runs after simulating (\ref{['eqn::QLD']}) for choices of $(p, T)$ in a Network Sato Game where the network is drawn from the Stochastic Block Model for varying choices of number of agents $N$ and $q$. The theoretical lower bound from Theorem \ref{['thm::sbm-convergence']} is plotted in white.
  • Figure 4: Proportion of converged runs after simulating (\ref{['eqn::QLD']}) for choices of $(p, T)$ in a Network Shapley Game where the network is drawn from the Stochastic Block Model for varying choices of number of agents $N$ and $q$. We find that each parameter both the random graph model parameters $p, q, N$ and the exploration rate $T$ influence the stability of (\ref{['eqn::QLD']}).

Theorems & Definitions (29)

  • Definition 1: Nash Equilibrium
  • Definition 2: Quantal Response Equilibrium (QRE)
  • Definition 3: Intensity of Identical Interests
  • Remark
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • Remark
  • Theorem 2
  • ...and 19 more