Table of Contents
Fetching ...

Bounded Rationality Equilibrium Learning in Mean Field Games

Yannick Eich, Christian Fabian, Kai Cui, Heinz Koeppl

TL;DR

This work tackles learning equilibria in large-scale mean-field games when agents are not perfectly rational. It introduces two bounded-r rationality concepts—quantal response equilibria (QRE) to capture noisy value estimation and receding horizon equilibria (RH-MFGs) to model limited lookahead—and develops formal fixed-point characterizations for each. The authors connect these notions to existing equilibria like entropy-regularized NE, establish theoretical relations (including a first-order approximation between QRE and RE), and propose generalized fixed-point iteration and fictitious play algorithms to learn QRE, RH, and regularized equilibria. Through experiments on SIS, random MFGs, and sequential RPS, the paper demonstrates convergence and highlights how bounded rationality shapes equilibrium behavior, offering scalable learning tools for realistic multi-agent systems with bounded rationality.

Abstract

Mean field games (MFGs) tractably model behavior in large agent populations. The literature on learning MFG equilibria typically focuses on finding Nash equilibria (NE), which assume perfectly rational agents and are hence implausible in many realistic situations. To overcome these limitations, we incorporate bounded rationality into MFGs by leveraging the well-known concept of quantal response equilibria (QRE). Two novel types of MFG QRE enable the modeling of large agent populations where individuals only noisily estimate the true objective. We also introduce a second source of bounded rationality to MFGs by restricting agents' planning horizon. The resulting novel receding horizon (RH) MFGs are combined with QRE and existing approaches to model different aspects of bounded rationality in MFGs. We formally define MFG QRE and RH MFGs and compare them to existing equilibrium concepts such as entropy-regularized NE. Subsequently, we design generalized fixed point iteration and fictitious play algorithms to learn QRE and RH equilibria. After a theoretical analysis, we give different examples to evaluate the capabilities of our learning algorithms and outline practical differences between the equilibrium concepts.

Bounded Rationality Equilibrium Learning in Mean Field Games

TL;DR

This work tackles learning equilibria in large-scale mean-field games when agents are not perfectly rational. It introduces two bounded-r rationality concepts—quantal response equilibria (QRE) to capture noisy value estimation and receding horizon equilibria (RH-MFGs) to model limited lookahead—and develops formal fixed-point characterizations for each. The authors connect these notions to existing equilibria like entropy-regularized NE, establish theoretical relations (including a first-order approximation between QRE and RE), and propose generalized fixed-point iteration and fictitious play algorithms to learn QRE, RH, and regularized equilibria. Through experiments on SIS, random MFGs, and sequential RPS, the paper demonstrates convergence and highlights how bounded rationality shapes equilibrium behavior, offering scalable learning tools for realistic multi-agent systems with bounded rationality.

Abstract

Mean field games (MFGs) tractably model behavior in large agent populations. The literature on learning MFG equilibria typically focuses on finding Nash equilibria (NE), which assume perfectly rational agents and are hence implausible in many realistic situations. To overcome these limitations, we incorporate bounded rationality into MFGs by leveraging the well-known concept of quantal response equilibria (QRE). Two novel types of MFG QRE enable the modeling of large agent populations where individuals only noisily estimate the true objective. We also introduce a second source of bounded rationality to MFGs by restricting agents' planning horizon. The resulting novel receding horizon (RH) MFGs are combined with QRE and existing approaches to model different aspects of bounded rationality in MFGs. We formally define MFG QRE and RH MFGs and compare them to existing equilibrium concepts such as entropy-regularized NE. Subsequently, we design generalized fixed point iteration and fictitious play algorithms to learn QRE and RH equilibria. After a theoretical analysis, we give different examples to evaluate the capabilities of our learning algorithms and outline practical differences between the equilibrium concepts.

Paper Structure

This paper contains 32 sections, 7 theorems, 46 equations, 10 figures, 4 algorithms.

Key Result

Proposition 1

Under Assm. ass:cont, a MFNE $\pi^*$ exists, and yields a finite game $\epsilon$-NE $\underline{\pi}^* = (\pi^*, \ldots, \pi^*)$, with $\epsilon \to 0$ as $N \to \infty$.

Figures (10)

  • Figure 1: A visualization of the resulting equilibrium policies (one-dimensional for illustration) of QRE and RE over temperature and receding horizon. In the limit of low temperature and infinite horizon, all concepts become the MFNE. In the limit of infinite temperature, all solutions become the constant uniform policy.
  • Figure 2: Convergence of GFP for the Susceptible-Infectious-Susceptible MFG with $\alpha = 1.0$ and $\beta=0.95$. The GFP algorithms for $\text{Q}^{\pi}\text{RE}$, $\text{Q}^{*}\text{RE}$ and RE show similar behaviour in the first iterations before converging to their respective equilibria.
  • Figure 3: Action probabilities in the Rock-Paper-Scissor problem at $t=0$ for the resulting $\text{Q}^{*}\text{RE}$ / $\text{Q}^{\pi}\text{RE}$ / RE using GFP (Alg. \ref{['alg2']}) over various temperatures $\alpha = 1/\lambda$. As $\alpha \to \infty$, we always obtain the uniform policy in the center, while as $\alpha \to 0$, solutions converge to the Nash solution (to the left). In-between, solutions differ from each other, regardless of the temperature.
  • Figure 4: Comparison of the distance of various RH $\text{Q}^{\pi}\text{RE}$ with different horizons $H$ to the $\text{Q}^{\pi}\text{RE}$ with total horizon $\mathcal{T}$ for a random MFG with $\alpha=1.0$ and $\beta=0.95$ over iterations $k$. The equilibria induced by assuming shorter lookahead capacities deviate more from the QRE with total horizon, demonstrating the impact of limited lookahead on equilibrium behavior.
  • Figure 5: Convergence of GFPI for a random MFG with $\alpha = 1.0$.
  • ...and 5 more figures

Theorems & Definitions (23)

  • Definition 1: Exploitability
  • Definition 2: Approximate NE
  • Definition 3: Mean Field NE
  • Proposition 1: saldi2018markov
  • Definition 4: RE
  • Definition 5: $\text{Q}^{\pi}\text{RE}$
  • Definition 6: $\text{Q}^*$RE
  • Definition 7: Logit $\text{Q}^{\pi}\text{RE}$
  • Definition 8: Boltzmann equilibrium
  • Proposition 2
  • ...and 13 more