Table of Contents
Fetching ...

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang

TL;DR

The paper tackles the scalability challenge of multi-agent reinforcement learning by introducing Mean Field Reinforcement Learning, which replaces explicit interactions among many agents with a mean-field term representing the average neighbor influence. It develops two practical algorithms, MF-Q and MF-AC, and provides convergence analysis showing learning to Nash equilibria under reasonable assumptions. Empirical results across Gaussian Squeeze, the Ising model, and a large-scale battle game demonstrate that MF-MARL learns effective policies and outperforms strong baselines, including solving the Ising model with a model-free approach. This work significantly advances scalable, theory-backed learning in large-population multi-agent systems and offers a practical path to applications with hundreds to thousands of agents.

Abstract

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present \emph{Mean Field Reinforcement Learning} where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

Mean Field Multi-Agent Reinforcement Learning

TL;DR

The paper tackles the scalability challenge of multi-agent reinforcement learning by introducing Mean Field Reinforcement Learning, which replaces explicit interactions among many agents with a mean-field term representing the average neighbor influence. It develops two practical algorithms, MF-Q and MF-AC, and provides convergence analysis showing learning to Nash equilibria under reasonable assumptions. Empirical results across Gaussian Squeeze, the Ising model, and a large-scale battle game demonstrate that MF-MARL learns effective policies and outperforms strong baselines, including solving the Ising model with a model-free approach. This work significantly advances scalable, theory-backed learning in large-population multi-agent systems and offers a practical path to applications with hundreds to thousands of agents.

Abstract

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present \emph{Mean Field Reinforcement Learning} where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

Paper Structure

This paper contains 23 sections, 6 theorems, 45 equations, 6 figures.

Key Result

Lemma 1

Under Assumption 2nasheq, the Nash operator $\mathcal{H}^{\mathop{}\!\mathtt{Nash}}$ in Eq. nashop forms a contraction mapping on the complete metric space from $\mathcal{Q}$ to $\mathcal{Q}$ with the fixed point being the Nash $Q$-value of the entire game, i.e.$\mathcal{H}^{\mathop{}\!\mathtt{Nash}

Figures (6)

  • Figure 1: MF-$Q$ iterations on a $3\times 3$ stateless toy example. The goal is to coordinate the agents to an agreed direction. Each agent has two choices of actions: up$\uparrow$ or down$\downarrow$. The reward of each agent's staying in the same direction as its [$0,1,2,3,4$] neighbors are [$-2.0, -1.0, 0.0, 1.0, 2.0$], respectively. The neighbors are specified by the four directions on the grid with cyclic structure on all directions, e.g. the first row and the third row are adjacent. The reward for the highlighted agent $j$ on the bottom left at time $t+1$ is $2.0$, as all neighboring agents stay down in the same time. We listed the Q-tables for agent $j$ at three time steps where $\bar{a}^j$ is the percentage of neighboring ups. Following Eq. \ref{['main_mfq']}, we have $Q^j_{t+1}(\uparrow, \bar{a}^j=0) = Q^j_t(\uparrow, \bar{a}^j=0) + \alpha [ r^j - Q^j_t(\uparrow, \bar{a}^j=0)] = 0.82 + 0.1\times (2.0-0.82) = 0.93$. The rightmost plot shows the convergent scenario where the $Q$-value of staying down is $2.0$, which is the largest reward in the environment.
  • Figure 2: Learning with $N$ agents in the GS environment with $\mu=400$ and $\sigma=200$.
  • Figure 3: The order parameter at equilibrium v.s. temperature in the Ising model with $20\times 20$ grid.
  • Figure 5: The spins of the Ising model at equilibrium under different temperatures.
  • Figure 6: The battle game: $64$v.s.$64$.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Proposition 1
  • proof
  • ...and 2 more