Table of Contents
Fetching ...

Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm

Fuzhong Zhou, Chenyu Zhang, Xu Chen, Xuan Di

TL;DR

This work develops a discrete-time graphon mean-field game formulated around a single representative player who interacts with heterogeneous agents via a graphon $W\in L_1^+[0,1]^2$. It establishes existence and uniqueness of a $W$-equilibrium under mild assumptions, and shows that such an equilibrium induces approximate equilibria for large, dense finite networks. A novel oracle-free online learning algorithm combines SARSA-based policy estimation with MCMC-based population estimation and comes with a nonasymptotic sample complexity analysis. The approach is validated through numerical experiments on flocking, SIS, and investment graphon games, demonstrating robust convergence and meaningful GMFE patterns. Overall, the paper provides a rigorous, scalable framework for analyzing and learning graphon games with heterogeneous network interactions.

Abstract

We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.

Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm

TL;DR

This work develops a discrete-time graphon mean-field game formulated around a single representative player who interacts with heterogeneous agents via a graphon . It establishes existence and uniqueness of a -equilibrium under mild assumptions, and shows that such an equilibrium induces approximate equilibria for large, dense finite networks. A novel oracle-free online learning algorithm combines SARSA-based policy estimation with MCMC-based population estimation and comes with a nonasymptotic sample complexity analysis. The approach is validated through numerical experiments on flocking, SIS, and investment graphon games, demonstrating robust convergence and meaningful GMFE patterns. Overall, the paper provides a rigorous, scalable framework for analyzing and learning graphon games with heterogeneous network interactions.

Abstract

We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.
Paper Structure (64 sections, 25 theorems, 194 equations, 8 figures, 4 tables, 3 algorithms)

This paper contains 64 sections, 25 theorems, 194 equations, 8 figures, 4 tables, 3 algorithms.

Key Result

Theorem 4.5

Suppose assump:existence holds. Then there exists a $W$-equilibrium $(\widehat{\mu}, \widehat{\pi})$. Moreover, the equilibrium optimal policy $\widehat{\pi}$ can be chosen to be a closed-loop policy.

Figures (8)

  • Figure 1: Algorithm performance (Flocking-Graphon)
  • Figure 2: GMFE (Flocking-Graphon)
  • Figure 3: Flocking-Graphon: Algorithm performance. We demonstrate the convergence gap (top), W1-distance (middle) and exploitability (bottom) corresponding to four types of graphs. The exploitability indicates how an agent can improve be deviating from the policy used by the rest of the population. Mathematically, the exploitability is calculated as $|J^{\mu}_\pi-J^{\mu}_{\pi^*(\mu)}|$. It measures the gap between the policy adopted by the population and the best policy that an agent can achieve in response to the population state.
  • Figure 4: Flocking-Graphon: GMFE. Top: The velocity control at position $x=0$. The x-axis denotes the time horizon and the y-axis denotes the velocity at equilibrium. The color bar denotes the label state. Bottom: The expected position $x$ across the time. It can be regarded as the centroid of the population.
  • Figure 5: SIS-Graphon: Algorithm performance. We demonstrate the convergence gap (top), W1-distance (middle) and exploitability (bottom) corresponding to four types of graphs.
  • ...and 3 more figures

Theorems & Definitions (50)

  • Definition 3.1
  • Definition 4.1
  • Remark 4.2
  • Remark 4.3: GMFG as MFG with augmented state space
  • Theorem 4.5
  • Theorem 4.7
  • Theorem 4.9
  • Remark 4.10
  • Remark 5.2
  • Theorem 5.4
  • ...and 40 more