Table of Contents
Fetching ...

Efficient and Scalable Deep Reinforcement Learning for Mean Field Control Games

Nianli Peng, Yilin Wang

TL;DR

This work targets scalable solution of Mean Field Control Games (MFCGs) by reframing the infinite‑agent problem as a Markov decision process and applying deep reinforcement learning. It builds an actor‑critic framework augmented with distribution learning via score matching to track the evolving mean field, and introduces batching, target networks, and DRL techniques to stabilize and accelerate training. Across a linear‑quadratic benchmark with a known analytic solution, the variants IH‑MFCG‑AC‑B and IH‑MFCG‑AC‑M achieve substantially faster convergence and closer adherence to the optimum than the baseline, while IH‑MFCG‑AC‑DRL shows mixed performance, highlighting both the potential and the limits of PPO/GAE in this setting. The approach lays groundwork for tackling richer, real‑world MFCGs (e.g., autonomous transportation, multi‑firm economics) where PDE methods are intractable and analytical solutions are unavailable.

Abstract

Mean Field Control Games (MFCGs) provide a powerful theoretical framework for analyzing systems of infinitely many interacting agents, blending elements from Mean Field Games (MFGs) and Mean Field Control (MFC). However, solving the coupled Hamilton-Jacobi-Bellman and Fokker-Planck equations that characterize MFCG equilibria remains a significant computational challenge, particularly in high-dimensional or complex environments. This paper presents a scalable deep Reinforcement Learning (RL) approach to approximate equilibrium solutions of MFCGs. Building on previous works, We reformulate the infinite-agent stochastic control problem as a Markov Decision Process, where each representative agent interacts with the evolving mean field distribution. We use the actor-critic based algorithm from a previous paper (Angiuli et.al., 2024) as the baseline and propose several versions of more scalable and efficient algorithms, utilizing techniques including parallel sample collection (batching); mini-batching; target network; proximal policy optimization (PPO); generalized advantage estimation (GAE); and entropy regularization. By leveraging these techniques, we effectively improved the efficiency, scalability, and training stability of the baseline algorithm. We evaluate our method on a linear-quadratic benchmark problem, where an analytical solution to the MFCG equilibrium is available. Our results show that some versions of our proposed approach achieve faster convergence and closely approximate the theoretical optimum, outperforming the baseline algorithm by an order of magnitude in sample efficiency. Our work lays the foundation for adapting deep RL to solve more complicated MFCGs closely related to real life, such as large-scale autonomous transportation systems, multi-firm economic competition, and inter-bank borrowing problems.

Efficient and Scalable Deep Reinforcement Learning for Mean Field Control Games

TL;DR

This work targets scalable solution of Mean Field Control Games (MFCGs) by reframing the infinite‑agent problem as a Markov decision process and applying deep reinforcement learning. It builds an actor‑critic framework augmented with distribution learning via score matching to track the evolving mean field, and introduces batching, target networks, and DRL techniques to stabilize and accelerate training. Across a linear‑quadratic benchmark with a known analytic solution, the variants IH‑MFCG‑AC‑B and IH‑MFCG‑AC‑M achieve substantially faster convergence and closer adherence to the optimum than the baseline, while IH‑MFCG‑AC‑DRL shows mixed performance, highlighting both the potential and the limits of PPO/GAE in this setting. The approach lays groundwork for tackling richer, real‑world MFCGs (e.g., autonomous transportation, multi‑firm economics) where PDE methods are intractable and analytical solutions are unavailable.

Abstract

Mean Field Control Games (MFCGs) provide a powerful theoretical framework for analyzing systems of infinitely many interacting agents, blending elements from Mean Field Games (MFGs) and Mean Field Control (MFC). However, solving the coupled Hamilton-Jacobi-Bellman and Fokker-Planck equations that characterize MFCG equilibria remains a significant computational challenge, particularly in high-dimensional or complex environments. This paper presents a scalable deep Reinforcement Learning (RL) approach to approximate equilibrium solutions of MFCGs. Building on previous works, We reformulate the infinite-agent stochastic control problem as a Markov Decision Process, where each representative agent interacts with the evolving mean field distribution. We use the actor-critic based algorithm from a previous paper (Angiuli et.al., 2024) as the baseline and propose several versions of more scalable and efficient algorithms, utilizing techniques including parallel sample collection (batching); mini-batching; target network; proximal policy optimization (PPO); generalized advantage estimation (GAE); and entropy regularization. By leveraging these techniques, we effectively improved the efficiency, scalability, and training stability of the baseline algorithm. We evaluate our method on a linear-quadratic benchmark problem, where an analytical solution to the MFCG equilibrium is available. Our results show that some versions of our proposed approach achieve faster convergence and closely approximate the theoretical optimum, outperforming the baseline algorithm by an order of magnitude in sample efficiency. Our work lays the foundation for adapting deep RL to solve more complicated MFCGs closely related to real life, such as large-scale autonomous transportation systems, multi-firm economic competition, and inter-bank borrowing problems.
Paper Structure (40 sections, 20 equations, 5 figures, 4 algorithms)

This paper contains 40 sections, 20 equations, 5 figures, 4 algorithms.

Figures (5)

  • Figure 1: Learned global/local distributions and value function using IH-MFCG-AC v.s. theoretical solution
  • Figure 2: Learned global/local distributions and value function using IH-MFCG-AC-B v.s. theoretical solution
  • Figure 3: Learned global/local distributions and value function using IH-MFCG-AC-M v.s. theoretical solution
  • Figure 4: Learned global/local distributions and value function using IH-MFCG-AC-DRL v.s. theoretical solution
  • Figure :