Table of Contents
Fetching ...

Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games

Lorenzo Magnino, Jiacheng Shen, Matthieu Geist, Olivier Pietquin, Mathieu Laurière

TL;DR

Bench-MFG tackles the lack of standardized evaluation for stationary Mean Field Games by proposing a unified benchmark suite with a taxonomic classification of MFG classes, prototypical environments, and a procedural generator (MF-Garnet). It couples BR-based fixed-point, policy-iteration, and exploitability-minimization solvers (including MF-PSO) and demonstrates the utility of a fast, open-source JAX implementation to stress-test robustness and generalization across regimes. The work provides practical guidelines for rigorous, reproducible MFG experimentation and highlights how different problem structures (e.g., dynamics-coupled or monotone settings) affect algorithm performance. By enabling systematic stress-testing and cross-class comparisons, Bench-MFG paves the way for more reliable development and evaluation of learning methods for large-scale multi-agent systems.

Abstract

The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing family of algorithms designed to solve large-scale multi-agent systems. However, the field currently lacks a standardized evaluation protocol, forcing researchers to rely on bespoke, isolated, and often simplistic environments. This fragmentation makes it difficult to assess the robustness, generalization, and failure modes of emerging methods. To address this gap, we propose a comprehensive benchmark suite for MFGs (Bench-MFG), focusing on the discrete-time, discrete-space, stationary setting for the sake of clarity. We introduce a taxonomy of problem classes, ranging from no-interaction and monotone games to potential and dynamics-coupled games, and provide prototypical environments for each. Furthermore, we propose MF-Garnets, a method for generating random MFG instances to facilitate rigorous statistical testing. We benchmark a variety of learning algorithms across these environments, including a novel black-box approach (MF-PSO) for exploitability minimization. Based on our extensive empirical results, we propose guidelines to standardize future experimental comparisons. Code available at \href{https://github.com/lorenzomagnino/Bench-MFG}{https://github.com/lorenzomagnino/Bench-MFG}.

Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games

TL;DR

Bench-MFG tackles the lack of standardized evaluation for stationary Mean Field Games by proposing a unified benchmark suite with a taxonomic classification of MFG classes, prototypical environments, and a procedural generator (MF-Garnet). It couples BR-based fixed-point, policy-iteration, and exploitability-minimization solvers (including MF-PSO) and demonstrates the utility of a fast, open-source JAX implementation to stress-test robustness and generalization across regimes. The work provides practical guidelines for rigorous, reproducible MFG experimentation and highlights how different problem structures (e.g., dynamics-coupled or monotone settings) affect algorithm performance. By enabling systematic stress-testing and cross-class comparisons, Bench-MFG paves the way for more reliable development and evaluation of learning methods for large-scale multi-agent systems.

Abstract

The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing family of algorithms designed to solve large-scale multi-agent systems. However, the field currently lacks a standardized evaluation protocol, forcing researchers to rely on bespoke, isolated, and often simplistic environments. This fragmentation makes it difficult to assess the robustness, generalization, and failure modes of emerging methods. To address this gap, we propose a comprehensive benchmark suite for MFGs (Bench-MFG), focusing on the discrete-time, discrete-space, stationary setting for the sake of clarity. We introduce a taxonomy of problem classes, ranging from no-interaction and monotone games to potential and dynamics-coupled games, and provide prototypical environments for each. Furthermore, we propose MF-Garnets, a method for generating random MFG instances to facilitate rigorous statistical testing. We benchmark a variety of learning algorithms across these environments, including a novel black-box approach (MF-PSO) for exploitability minimization. Based on our extensive empirical results, we propose guidelines to standardize future experimental comparisons. Code available at \href{https://github.com/lorenzomagnino/Bench-MFG}{https://github.com/lorenzomagnino/Bench-MFG}.
Paper Structure (32 sections, 3 theorems, 20 equations, 20 figures, 2 tables, 8 algorithms)

This paper contains 32 sections, 3 theorems, 20 equations, 20 figures, 2 tables, 8 algorithms.

Key Result

Proposition 4.6

Consider a separable reward function $r(x, a, \mu) = \psi(x, a) + g(x, \mu)$, where $g: \mathcal{X} \times \Delta_{\mathcal{X}} \to \mathbb R$ is continuously differentiable with respect to $\mu$. The game is a potential MFG if and only if the Jacobian matrix of the population-dependent term is symm

Figures (20)

  • Figure 1: Bench-MFG Overview
  • Figure 2: NI-MFG. Move Forward. (top) Exploitabilities (bot.) Equilibrium $(\mu^*, \pi^*)$ for PI.
  • Figure 3: C-MFG. Coordination Game. Params. $C=80, \alpha=1$ (top) Exploitabilities (bot.) Equilibrium $(\mu^*, \pi^*)$ for PI.
  • Figure 4: LL-MFG. Beach Bar Problem. Params. $\alpha=5, c_2=5, c_1=2$ (top) Exploitabilities (bot.) Equilibrium $(\mu^*, \pi^*)$ for OMD.
  • Figure 5: Multiple Equilibria. Two Beach Bars Problem Params. $\alpha=60, c_2=15, c_1=0.5$. (top) Exploitabilities (bot.) Equilibrium $(\mu^*, \pi^*)$ for FP.
  • ...and 15 more figures

Theorems & Definitions (21)

  • Definition 2.1
  • Definition 4.1
  • Example 1: Move forward
  • Definition 4.2: Contractive MFG
  • Example 2: Coordination game
  • Definition 4.3
  • Definition 4.4: LL-MFG
  • Example 3: Beach Bar Problem
  • Example 4: Two–Beach Bar Coordination MFG
  • Definition 4.5: P-MFG
  • ...and 11 more