Table of Contents
Fetching ...

Convergence of Actor-Critic Learning for Mean Field Games and Mean Field Control in Continuous Spaces

Jean-Pierre Fouque, Mathieu Laurière, Mengrui Zhang

TL;DR

The work delivers a rigorous convergence analysis for a deep actor-critic method solving infinite-horizon mean-field problems in continuous spaces, distinguishing between Mean Field Game and Mean Field Control regimes via a two-timescale learning-rate scheme. A discretization-bin approach is introduced for the mean-field control limit, and the analysis extends to Mean Field Control Games, with both idealized three-time-scale results and practical stochastic-approximation algorithms. Theoretical results show convergence to MF equilibria (MFG) or MF optima (MFC), complemented by extensive numerical validation on linear-quadratic benchmarks in 1D and 2D. The study provides a unified, scalable framework for learning in large populations and demonstrates the practicality of IH-MF-AC and IH-MFCG-AC for complex cooperative-competitive settings. Future work includes extending the methods to finite-horizon problems and exploring broader function-approximation schemes.

Abstract

We establish the convergence of the deep actor-critic reinforcement learning algorithm presented in [Angiuli et al., 2023a] in the setting of continuous state and action spaces with an infinite discrete-time horizon. This algorithm provides solutions to Mean Field Game (MFG) or Mean Field Control (MFC) problems depending on the ratio between two learning rates: one for the value function and the other for the mean field term. In the MFC case, to rigorously identify the limit, we introduce a discretization of the state and action spaces, following the approach used in the finite-space case in [Angiuli et al., 2023b]. The convergence proofs rely on a generalization of the two-timescale framework introduced in [Borkar, 1997]. We further extend our convergence results to Mean Field Control Games, which involve locally cooperative and globally competitive populations. Finally, we present numerical experiments for linear-quadratic problems in one and two dimensions, for which explicit solutions are available.

Convergence of Actor-Critic Learning for Mean Field Games and Mean Field Control in Continuous Spaces

TL;DR

The work delivers a rigorous convergence analysis for a deep actor-critic method solving infinite-horizon mean-field problems in continuous spaces, distinguishing between Mean Field Game and Mean Field Control regimes via a two-timescale learning-rate scheme. A discretization-bin approach is introduced for the mean-field control limit, and the analysis extends to Mean Field Control Games, with both idealized three-time-scale results and practical stochastic-approximation algorithms. Theoretical results show convergence to MF equilibria (MFG) or MF optima (MFC), complemented by extensive numerical validation on linear-quadratic benchmarks in 1D and 2D. The study provides a unified, scalable framework for learning in large populations and demonstrates the practicality of IH-MF-AC and IH-MFCG-AC for complex cooperative-competitive settings. Future work includes extending the methods to finite-horizon problems and exploring broader function-approximation schemes.

Abstract

We establish the convergence of the deep actor-critic reinforcement learning algorithm presented in [Angiuli et al., 2023a] in the setting of continuous state and action spaces with an infinite discrete-time horizon. This algorithm provides solutions to Mean Field Game (MFG) or Mean Field Control (MFC) problems depending on the ratio between two learning rates: one for the value function and the other for the mean field term. In the MFC case, to rigorously identify the limit, we introduce a discretization of the state and action spaces, following the approach used in the finite-space case in [Angiuli et al., 2023b]. The convergence proofs rely on a generalization of the two-timescale framework introduced in [Borkar, 1997]. We further extend our convergence results to Mean Field Control Games, which involve locally cooperative and globally competitive populations. Finally, we present numerical experiments for linear-quadratic problems in one and two dimensions, for which explicit solutions are available.

Paper Structure

This paper contains 31 sections, 8 theorems, 72 equations, 7 figures, 6 tables, 3 algorithms.

Key Result

Proposition 5.6

Under Assumptions ACM1, ACM2, ACM3, squaresumlearningrate_mfg and MFG_ACGASE, for fixed $\mu$, as $n\to\infty$, $\theta_n \to \theta^*_\mu$ and $\psi_n \to \psi^*_\mu$ where $(\theta^*_\mu,\psi^*_\mu)$ is the GASE mentioned in Assumption MFG_ACGASE.

Figures (7)

  • Figure 1: The histogram is the learned asymptotic distribution and the dashed line is the learned feedback control after $N=2\times10^5$ iterations. The green curves correspond to the optimal control and mean field distribution for MFC, while the orange curves are the equivalent for MFG. The bottom axis shows the state variable $x$, the left axis refers to the value of the control $\alpha(x)$, and the right axis represents the probability density for $\mu$.
  • Figure 2: The orange and green curves are the optimal value functions for the MFG and MFC problem, respectively. The blue dashed line is the learned value function among all bins after $N=2\times10^5$ iterations. It is a poly-line for MFC, since we use bins
  • Figure 3: The scatter plot of points are the samples of the learned asymptotic distribution after $N=10^6$ iterations. The solid ellipses are the set of points with Mahalanobis distance $3$ from the optimal mean field distributions in the case of MFG (orange) and MFC (green).
  • Figure 4: The orange and green vector fields are the optimal controls for the MFG and MFC problems, respectively. The blue vector field show the learned feedback controls after $N=10^6$ iterations.
  • Figure 5: The right-hand side plots are the optimal value functions for the MFG and MFC problems, respectively. The left-hand side plots show the learned functions after $N=10^6$ iterations.
  • ...and 2 more figures

Theorems & Definitions (16)

  • Remark 4.1
  • Proposition 5.6
  • proof
  • Lemma 5.8
  • Proposition 5.9
  • proof
  • Theorem 5.10
  • proof
  • Proposition 5.12
  • proof
  • ...and 6 more