Table of Contents
Fetching ...

A Variational Inequality Approach to Independent Learning in Static Mean-Field Games

Batuhan Yardim, Semih Cayci, Niao He

TL;DR

This work introduces a decentralized learning framework for static mean-field games (SMFGs) with a very large population. It anchors learning to a variational-inequality (VI) formulation of the infinite-agent limit and develops independent-learning algorithms under full and bandit feedback, augmented by a $\tau$-Tikhonov regularization to stabilize updates. The authors provide explicit finite-sample bounds on exploitability (approximate NE quality) and establish convergence results for both full-information and bandit settings, including optimal parameter choices when the population size is known. Empirical validation on synthetic problems, city traffic, and Tor network access corroborates the theory, showing the mean-field NE effectively characterizes limiting behavior as $N$ grows and that independent-learning schemes can attain near-equilibrium performance without central coordination.

Abstract

Competitive games involving thousands or even millions of players are prevalent in real-world contexts, such as transportation, communications, and computer networks. However, learning in these large-scale multi-agent environments presents a grand challenge, often referred to as the "curse of many agents". In this paper, we formalize and analyze the Static Mean-Field Game (SMFG) under both full and bandit feedback, offering a generic framework for modeling large population interactions while enabling independent learning. We first establish close connections between SMFG and variational inequality (VI), showing that SMFG can be framed as a VI problem in the infinite agent limit. Building on the VI perspective, we propose independent learning and exploration algorithms that efficiently converge to approximate Nash equilibria, when dealing with a finite number of agents. Theoretically, we provide explicit finite sample complexity guarantees for independent learning across various feedback models in repeated play scenarios, assuming (strongly-)monotone payoffs. Numerically, we validate our results through both simulations and real-world applications in city traffic and network access management.

A Variational Inequality Approach to Independent Learning in Static Mean-Field Games

TL;DR

This work introduces a decentralized learning framework for static mean-field games (SMFGs) with a very large population. It anchors learning to a variational-inequality (VI) formulation of the infinite-agent limit and develops independent-learning algorithms under full and bandit feedback, augmented by a -Tikhonov regularization to stabilize updates. The authors provide explicit finite-sample bounds on exploitability (approximate NE quality) and establish convergence results for both full-information and bandit settings, including optimal parameter choices when the population size is known. Empirical validation on synthetic problems, city traffic, and Tor network access corroborates the theory, showing the mean-field NE effectively characterizes limiting behavior as grows and that independent-learning schemes can attain near-equilibrium performance without central coordination.

Abstract

Competitive games involving thousands or even millions of players are prevalent in real-world contexts, such as transportation, communications, and computer networks. However, learning in these large-scale multi-agent environments presents a grand challenge, often referred to as the "curse of many agents". In this paper, we formalize and analyze the Static Mean-Field Game (SMFG) under both full and bandit feedback, offering a generic framework for modeling large population interactions while enabling independent learning. We first establish close connections between SMFG and variational inequality (VI), showing that SMFG can be framed as a VI problem in the infinite agent limit. Building on the VI perspective, we propose independent learning and exploration algorithms that efficiently converge to approximate Nash equilibria, when dealing with a finite number of agents. Theoretically, we provide explicit finite sample complexity guarantees for independent learning across various feedback models in repeated play scenarios, assuming (strongly-)monotone payoffs. Numerically, we validate our results through both simulations and real-world applications in city traffic and network access management.

Paper Structure

This paper contains 34 sections, 16 theorems, 97 equations, 8 figures, 2 algorithms.

Key Result

Lemma 1

For any policy profile $(\pmb{\pi}\xspace\xspace^1, \ldots, \pmb{\pi}\xspace\xspace^N) \in \Delta_\mathcal{A}\xspace^N$, it holds that

Figures (8)

  • Figure 1: Results for numerical problems kl, bb, linear, utd. (a) Maximum exploitability of $N$ agents at convergence as a function of $N$ for different problems, (b) The max. exploitability among $N$ agents during training with linear payoff (linear), for different $N$, (c) The mean $\ell_2$ distance of agent policies during training to the MF-NE in the Zurich traffic flow simulation problem (utd).
  • Figure 2: Results for the Tor network experiment. (a) Average policies (probability distribution) over 5 servers of the 100 agents in the Tor network access experiment, (b) Empirical distribution of agents over Tor entry servers during training on 5 servers (different colors indicate different entry servers), (c) Average waiting times for Tor network access during training.
  • Figure 3: The (smoothed) maximum exploitability $\max_{i\in\mathcal{N}\xspace} \phi^i(\{\pmb{\pi}\xspace\xspace^j\}_{j=1}^N)$ among $N$ agents throughout learning with full feedback for three different $N$, on the problems (a) linear payoffs, (b) exponentially decreasing payoffs, (c) payoffs with KL potential and (d) the beach bar payoffs.
  • Figure 4: The mean $\ell_2$ distance to MF-NE given by $\frac{1}{N}\sum_{i\in\mathcal{N}\xspace} \| \pmb{\pi}\xspace\xspace^i - \pmb{\pi}\xspace\xspace^*\|_2$ with $N$ agents throughout learning with full feedback for three different $N$, on the problems (a) linear payoffs, (b) exponentially decreasing payoffs, (c) payoffs with KL potential and (d) the beach bar payoffs.
  • Figure 5: The (smoothed) maximum exploitability $\max_{i\in\mathcal{N}\xspace} \phi^i(\{\pmb{\pi}\xspace\xspace^j\}_{j=1}^N)$ among $N$ agents throughout learning with bandit feedback for three different $N$, on the problems (a) linear payoffs, (b) exponentially decreasing payoffs, (c) payoffs with KL potential and (d) the beach bar payoffs.
  • ...and 3 more figures

Theorems & Definitions (26)

  • Definition 1: Expected payoff, exploitability, Nash equilibrium
  • Definition 2: MF-NE
  • Lemma 1
  • Lemma 2: $V^i, \mathcal{E}\xspace^i_{\text{exp}}$ are Lipschitz
  • Example 1: Non-increasing payoffs
  • Example 2: Multi-player MAB with Collisions
  • Remark 1: SMFG is not a potential game
  • Remark 2: Existence and Uniqueness of MF-NE
  • Theorem 1
  • Theorem 2
  • ...and 16 more