Table of Contents
Fetching ...

Causal Mean Field Multi-Agent Reinforcement Learning

Hao Ma, Zhiqiang Pu, Yi Pan, Boyin Liu, Junlong Gao, Zhenyu Guo

TL;DR

CMFQ tackles scalability in large-scale multi-agent reinforcement learning by embedding causality into mean-field Q-learning. It builds a structural causal model to identify and weight essential pairwise interactions via counterfactual interventions, creating a causality-aware representation $\check{a}^i_{t-1}=\sum_{k} w^{i,k}_t a^{i,k}_{t-1}$. The algorithm demonstrates superior training and test scalability in mixed cooperative-competitive and purely cooperative games, outperforming MFQ and attention-based baselines. This approach offers a flexible framework for incorporating causal inference into MFRL, with potential for broader applicability in large MAS and real-world scenarios.

Abstract

Scalability remains a challenge in multi-agent reinforcement learning and is currently under active research. A framework named mean-field reinforcement learning (MFRL) could alleviate the scalability problem by employing the Mean Field Theory to turn a many-agent problem into a two-agent problem. However, this framework lacks the ability to identify essential interactions under nonstationary environments. Causality contains relatively invariant mechanisms behind interactions, though environments are nonstationary. Therefore, we propose an algorithm called causal mean-field Q-learning (CMFQ) to address the scalability problem. CMFQ is ever more robust toward the change of the number of agents though inheriting the compressed representation of MFRL's action-state space. Firstly, we model the causality behind the decision-making process of MFRL into a structural causal model (SCM). Then the essential degree of each interaction is quantified via intervening on the SCM. Furthermore, we design the causality-aware compact representation for behavioral information of agents as the weighted sum of all behavioral information according to their causal effects. We test CMFQ in a mixed cooperative-competitive game and a cooperative game. The result shows that our method has excellent scalability performance in both training in environments containing a large number of agents and testing in environments containing much more agents.

Causal Mean Field Multi-Agent Reinforcement Learning

TL;DR

CMFQ tackles scalability in large-scale multi-agent reinforcement learning by embedding causality into mean-field Q-learning. It builds a structural causal model to identify and weight essential pairwise interactions via counterfactual interventions, creating a causality-aware representation . The algorithm demonstrates superior training and test scalability in mixed cooperative-competitive and purely cooperative games, outperforming MFQ and attention-based baselines. This approach offers a flexible framework for incorporating causal inference into MFRL, with potential for broader applicability in large MAS and real-world scenarios.

Abstract

Scalability remains a challenge in multi-agent reinforcement learning and is currently under active research. A framework named mean-field reinforcement learning (MFRL) could alleviate the scalability problem by employing the Mean Field Theory to turn a many-agent problem into a two-agent problem. However, this framework lacks the ability to identify essential interactions under nonstationary environments. Causality contains relatively invariant mechanisms behind interactions, though environments are nonstationary. Therefore, we propose an algorithm called causal mean-field Q-learning (CMFQ) to address the scalability problem. CMFQ is ever more robust toward the change of the number of agents though inheriting the compressed representation of MFRL's action-state space. Firstly, we model the causality behind the decision-making process of MFRL into a structural causal model (SCM). Then the essential degree of each interaction is quantified via intervening on the SCM. Furthermore, we design the causality-aware compact representation for behavioral information of agents as the weighted sum of all behavioral information according to their causal effects. We test CMFQ in a mixed cooperative-competitive game and a cooperative game. The result shows that our method has excellent scalability performance in both training in environments containing a large number of agents and testing in environments containing much more agents.

Paper Structure

This paper contains 18 sections, 20 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Blue agents and orange agents belong to different teams. The purple agent denote a merged agent that simply average all agents in agent $i$'s neighborhood. The diagram on the left shows a scenario in which the central agent $i$ interacts with many agents, $i_k$ denotes the $k^{th}$ agent in the observation of agent $i$. In the framework of MFRL, the scenario is transferred to the diagram in the middle, in which an merged agent is used to characterize all the agents in the central agent's observation. Our method further enables the central agent to learn to ask "what if". When it asks this question, it can imagine the scenario illustrated in the right diagram. The central agent can hypothetically replace the action of the merged agent in MFRL with the action of a neighborhood agent, and if this replacement will cause dramatic changes in policy, it means this neighborhood agent is potentially important. Thus central agent should pay more attention to the interaction with this neighborhood agent.
  • Figure 2: (a) is CMFQ's architecture. Each neighborhood agent is assigned a weight according to its causal effect to the policy of the central agent. (b) is the causal module. It calculate the $KL$ divergence between the two policies that the merged agent is represented by the average action and the $k^{th}$ neighborhood agent action respectively. A large $KL$ divergence means the $k^{th}$ neighborhood agent might be ignored in the merged agent represented by the average action, hence it should be assigned a higher weight to form a better merged agent.
  • Figure 3: (a) is a canonical SCM, when $do(x_0)$ is performed on $X$, all causes of $X$ will be broken and keep all variable constant but only change $X$ to $x_0$. (b) is the SCM of MFRL, the $do$-calculus on $\bar{a}^i_{t-1}$ follows the same procedure.
  • Figure 4: Win rate during execution. (a) demonstrates the curves of total reward during training for each algorithm. (b) shows the results that algorithms battle against each other. the horizontal axis is divided into five groups by algorithms, and within each group there are five bars representing the win rate of the algorithm on the horizontal axis. (c) shows win rates of algorithms in the label against MFQ algorithms which are on the horizontal axis. (d) shows the win rate of CMFQ with different $\epsilon$ against other algorithms.
  • Figure 5: Visualization of CMFQ vs MFQ in 64 vs 64 environment. Red squares denote CMFQ, and blue squares denote MFQ, the vertical bar on the left side of the square indicates its health point, and the surrounding circular area indicates its attack range. When agent attacks, an arrow will be extended to point at the attack target.
  • ...and 5 more figures