Table of Contents
Fetching ...

Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation

Patrick Benjamin, Alessandro Abate

TL;DR

This work extends mean-field game learning to function approximation in a decentralized, online, non-episodic setting. It introduces Munchausen Online Mirror Descent with networked policy adoption and two mean-field estimation schemes (general and visibility-based) to enable population-dependent policies without full global observability. Theoretical results show networked agents can learn faster than central learners under certain assumptions, and experiments demonstrate scalability to large state spaces with improved performance over independent and central-agent baselines. Overall, the paper advances practical, scalable, decentralized MFG algorithms that leverage inter-agent communication to improve learning and mean-field estimation.

Abstract

Recent algorithms allow decentralised agents, possibly connected via a communication network, to learn equilibria in mean-field games from a non-episodic run of the empirical system. However, these algorithms are for tabular settings: this computationally limits the size of agents' observation space, meaning the algorithms cannot handle anything but small state spaces, nor generalise beyond policies depending only on the agent's local state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the mean field in the observation for players' policies, it is unrealistic to assume decentralised agents have access to this global information: we therefore also provide new algorithms allowing agents to locally estimate the global empirical distribution, and to improve this estimate via inter-agent communication. We prove theoretically that exchanging policy information helps networked agents outperform both independent and even centralised agents in function-approximation settings. Our experiments demonstrate this happening empirically, and show that the communication network allows decentralised agents to estimate the mean field for population-dependent policies.

Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation

TL;DR

This work extends mean-field game learning to function approximation in a decentralized, online, non-episodic setting. It introduces Munchausen Online Mirror Descent with networked policy adoption and two mean-field estimation schemes (general and visibility-based) to enable population-dependent policies without full global observability. Theoretical results show networked agents can learn faster than central learners under certain assumptions, and experiments demonstrate scalability to large state spaces with improved performance over independent and central-agent baselines. Overall, the paper advances practical, scalable, decentralized MFG algorithms that leverage inter-agent communication to improve learning and mean-field estimation.

Abstract

Recent algorithms allow decentralised agents, possibly connected via a communication network, to learn equilibria in mean-field games from a non-episodic run of the empirical system. However, these algorithms are for tabular settings: this computationally limits the size of agents' observation space, meaning the algorithms cannot handle anything but small state spaces, nor generalise beyond policies depending only on the agent's local state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the mean field in the observation for players' policies, it is unrealistic to assume decentralised agents have access to this global information: we therefore also provide new algorithms allowing agents to locally estimate the global empirical distribution, and to improve this estimate via inter-agent communication. We prove theoretically that exchanging policy information helps networked agents outperform both independent and even centralised agents in function-approximation settings. Our experiments demonstrate this happening empirically, and show that the communication network allows decentralised agents to estimate the mean field for population-dependent policies.
Paper Structure (33 sections, 1 theorem, 14 equations, 9 figures, 1 table, 3 algorithms)

This paper contains 33 sections, 1 theorem, 14 equations, 9 figures, 1 table, 3 algorithms.

Key Result

Theorem 6.3

Given Assumptions single_policy_assumption and approximation_ordering_assumption, Thus in expectation networked populations will increase their returns faster than central-agent ones.

Figures (9)

  • Figure 1: 'Target agreement', pop.-independent, $100\times 100$ grid. Reproduced larger in Fig. \ref{['agree100bigger']}. The networked populations of all broadcast radii significantly outperform the central-agent and independent populations in terms of average return, where the latter two cases hardly appear to learn at all.
  • Figure 2: 'Cluster', pop.-independent, $100\times 100$ grid. Larger version in Fig. \ref{['cluster100bigger']}. Networked populations of all broadcast radii outperform the central-agent and independent populations wrt. average return; independent agents hardly learn at all.
  • Figure 3: Larger version of Fig. \ref{['agree100']}. 'Target agreement', population-independent, $100\times 100$ grid. The networked populations of all broadcast radii significantly outperform the central-agent and independent populations in terms of average return, where the latter two cases hardly appear to learn at all.
  • Figure 4: Larger version of Fig. \ref{['cluster100']}. 'Cluster', population-independent, $100\times 100$ grid. Networked populations of all broadcast radii outperform the central-agent and independent populations wrt. average return; independent agents hardly appear to learn at all.
  • Figure 5: 'Target agreement' task, population-independent policies, $50\times 50$ grid. The networked populations of all broadcast radii significantly outperform the central-agent and independent populations in terms of average return, where the latter two cases hardly appear to learn at all.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Definition 3.1: N-player symmetric anonymous games
  • Definition 3.2: Induced mean-field flow
  • Definition 3.3: Mean-field discounted return
  • Definition 3.4: Best-response (BR) policy
  • Definition 3.5: Mean-field Nash equilibrium (MFNE)
  • Definition 3.6: Time-varying network
  • Definition 3.7: Time-varying state-visibility graph
  • Definition 4.1: Empirical loss for Q-network
  • Theorem 6.3
  • Definition A.1: Exploitability of $\pi$