A Variational Inequality Approach to Independent Learning in Static Mean-Field Games
Batuhan Yardim, Semih Cayci, Niao He
TL;DR
This work introduces a decentralized learning framework for static mean-field games (SMFGs) with a very large population. It anchors learning to a variational-inequality (VI) formulation of the infinite-agent limit and develops independent-learning algorithms under full and bandit feedback, augmented by a $\tau$-Tikhonov regularization to stabilize updates. The authors provide explicit finite-sample bounds on exploitability (approximate NE quality) and establish convergence results for both full-information and bandit settings, including optimal parameter choices when the population size is known. Empirical validation on synthetic problems, city traffic, and Tor network access corroborates the theory, showing the mean-field NE effectively characterizes limiting behavior as $N$ grows and that independent-learning schemes can attain near-equilibrium performance without central coordination.
Abstract
Competitive games involving thousands or even millions of players are prevalent in real-world contexts, such as transportation, communications, and computer networks. However, learning in these large-scale multi-agent environments presents a grand challenge, often referred to as the "curse of many agents". In this paper, we formalize and analyze the Static Mean-Field Game (SMFG) under both full and bandit feedback, offering a generic framework for modeling large population interactions while enabling independent learning. We first establish close connections between SMFG and variational inequality (VI), showing that SMFG can be framed as a VI problem in the infinite agent limit. Building on the VI perspective, we propose independent learning and exploration algorithms that efficiently converge to approximate Nash equilibria, when dealing with a finite number of agents. Theoretically, we provide explicit finite sample complexity guarantees for independent learning across various feedback models in repeated play scenarios, assuming (strongly-)monotone payoffs. Numerically, we validate our results through both simulations and real-world applications in city traffic and network access management.
