Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

Aamal Hussain; Dan Leonte; Francesco Belardinelli; Raphael Huser; Dario Paccagnan

Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

Aamal Hussain, Dan Leonte, Francesco Belardinelli, Raphael Huser, Dario Paccagnan

TL;DR

This work analyzes the convergence of Q-Learning Dynamics in network polymatrix games where agents interact over random graphs (ER and SBM). It derives explicit, sparsity-driven conditions on exploration rates that guarantee a unique Quantal Response Equilibrium (QRE) with high probability as the agent count grows, linking convergence to the spectral radius of the network adjacency. Theoretical bounds show that sparser networks enable reliable convergence at lower exploration, while densely connected graphs may hinder it, a finding corroborated by extensive simulations. By framing the problem through monotone variational inequality concepts, the paper provides practical guidance for achieving stable learning in large-scale, network-based multi-agent systems and outlines avenues for extending the results to broader game classes and dynamical settings.

Abstract

Beyond specific settings, many multi-agent learning algorithms fail to converge to an equilibrium solution, and instead display complex, non-stationary behaviours such as recurrent or chaotic orbits. In fact, recent literature suggests that such complex behaviours are likely to occur when the number of agents increases. In this paper, we study Q-learning dynamics in network polymatrix games where the network structure is drawn from classical random graph models. In particular, we focus on the Erdos-Renyi model, a well-studied model for social networks, and the Stochastic Block model, which generalizes the above by accounting for community structures within the network. In each setting, we establish sufficient conditions under which the agents' joint strategies converge to a unique equilibrium. We investigate how this condition depends on the exploration rates, payoff matrices and, crucially, the sparsity of the network. Finally, we validate our theoretical findings through numerical simulations and demonstrate that convergence can be reliably achieved in many-agent systems, provided network sparsity is controlled.

Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

TL;DR

Abstract

Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (29)