Table of Contents
Fetching ...

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

Andrea Angiuli, Jean-Pierre Fouque, Ruimeng Hu, Alan Raydan

TL;DR

This work develops IH-MF-AC, an online, model-free actor-critic algorithm that integrates a score-based representation of the mean-field distribution with Langevin sampling to solve infinite-horizon continuous-space mean field games and mean field control problems. By tuning the relative learning rates among the actor, critic, and score networks, the method can converge to either the MF equilibrium (MFG) or the MF optimum (MFC), and it can be extended to mixed MF control games (MFCG). The approach is validated on linear-quadratic benchmarks, yielding analytic MF-G/MF-C solutions and demonstrating stable recovery of the optimal mean-field distributions and controls, with insights into stability and exploration. The work advances model-free techniques for large-population mean-field problems in continuous spaces and provides a scalable framework for future research in finite-horizon extensions and rigorous convergence guarantees.

Abstract

We present the development and analysis of a reinforcement learning (RL) algorithm designed to solve continuous-space mean field game (MFG) and mean field control (MFC) problems in a unified manner. The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function, which can be efficiently updated in an online fashion, and uses Langevin dynamics to obtain samples from the resulting distribution. The AC agent and the score function are updated iteratively to converge, either to the MFG equilibrium or the MFC optimum for a given mean field problem, depending on the choice of learning rates. A straightforward modification of the algorithm allows us to solve mixed mean field control games (MFCGs). The performance of our algorithm is evaluated using linear-quadratic benchmarks in the asymptotic infinite horizon framework.

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

TL;DR

This work develops IH-MF-AC, an online, model-free actor-critic algorithm that integrates a score-based representation of the mean-field distribution with Langevin sampling to solve infinite-horizon continuous-space mean field games and mean field control problems. By tuning the relative learning rates among the actor, critic, and score networks, the method can converge to either the MF equilibrium (MFG) or the MF optimum (MFC), and it can be extended to mixed MF control games (MFCG). The approach is validated on linear-quadratic benchmarks, yielding analytic MF-G/MF-C solutions and demonstrating stable recovery of the optimal mean-field distributions and controls, with insights into stability and exploration. The work advances model-free techniques for large-population mean-field problems in continuous spaces and provides a scalable framework for future research in finite-horizon extensions and rigorous convergence guarantees.

Abstract

We present the development and analysis of a reinforcement learning (RL) algorithm designed to solve continuous-space mean field game (MFG) and mean field control (MFC) problems in a unified manner. The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function, which can be efficiently updated in an online fashion, and uses Langevin dynamics to obtain samples from the resulting distribution. The AC agent and the score function are updated iteratively to converge, either to the MFG equilibrium or the MFC optimum for a given mean field problem, depending on the choice of learning rates. A straightforward modification of the algorithm allows us to solve mixed mean field control games (MFCGs). The performance of our algorithm is evaluated using linear-quadratic benchmarks in the asymptotic infinite horizon framework.
Paper Structure (27 sections, 82 equations, 19 figures, 6 tables, 2 algorithms)

This paper contains 27 sections, 82 equations, 19 figures, 6 tables, 2 algorithms.

Figures (19)

  • Figure 1: Running cost coefficients and volatility for \ref{['eq: lq cost', 'eq: lq dynamics']}. The results for this parameter set are displayed in \ref{['fig: results 1', 'fig: mean errors 1', 'fig: values 1', 'fig: value errors 1', 'fig: control errors 1']}.
  • Figure 2: We plot the absolute error between the mean of samples produced from the parameterized score function $\Sigma_{\varphi_n}$ and the optimal mean $\hat{m}$ in the case of MFG (left) and $m^*$ in the case of MFC (right). These values were averaged over five runs each with different random initial samples with the standard deviation given by the light blue shaded region. Large jumps are due to random outliers which result from the stochasticity of our algorithm.
  • Figure 3: The orange and green curves are the optimal value functions for the MFG and MFC problem, respectively. The blue dashed line is the learned value function given by the negative of critic $V_{\theta_N}$ averaged over five runs with different initial samples after $N = 10^6$ iterations. Since the original optimization problem aims to minimize cost while our algorithm seeks to maximize reward, we take the negative of the critic function to make the problems equivalent. The light blue shaded region depicts one standard deviation from the learned value.
  • Figure 4: We plot the absolute error between the expected error between the learned value function $V_{\theta_n}$ and the optimal value function $\hat{v}$ in the case of MFG (left) and $v^*$ in the case of MFC (right). These plots were averaged over five runs each with different random initial samples and with the standard deviation given by the light blue shaded region. Large jumps are due to random outliers which result from the stochasticity of our algorithm.
  • Figure 5: We plot the expected error between the learned control function $\alpha_n = \mathbb{E}[\Pi_{\psi_n}]$ and the optimal control $\hat{\alpha}$ in the case of MFG (left) and $\alpha^*$ in the case of MFC (right). These plots were averaged over five runs each with different random initial samples and with the standard deviation given by the light blue shaded region. Large jumps are due to random outliers which result from the stochasticity of our algorithm.
  • ...and 14 more figures