Causal Mean Field Multi-Agent Reinforcement Learning
Hao Ma, Zhiqiang Pu, Yi Pan, Boyin Liu, Junlong Gao, Zhenyu Guo
TL;DR
CMFQ tackles scalability in large-scale multi-agent reinforcement learning by embedding causality into mean-field Q-learning. It builds a structural causal model to identify and weight essential pairwise interactions via counterfactual interventions, creating a causality-aware representation $\check{a}^i_{t-1}=\sum_{k} w^{i,k}_t a^{i,k}_{t-1}$. The algorithm demonstrates superior training and test scalability in mixed cooperative-competitive and purely cooperative games, outperforming MFQ and attention-based baselines. This approach offers a flexible framework for incorporating causal inference into MFRL, with potential for broader applicability in large MAS and real-world scenarios.
Abstract
Scalability remains a challenge in multi-agent reinforcement learning and is currently under active research. A framework named mean-field reinforcement learning (MFRL) could alleviate the scalability problem by employing the Mean Field Theory to turn a many-agent problem into a two-agent problem. However, this framework lacks the ability to identify essential interactions under nonstationary environments. Causality contains relatively invariant mechanisms behind interactions, though environments are nonstationary. Therefore, we propose an algorithm called causal mean-field Q-learning (CMFQ) to address the scalability problem. CMFQ is ever more robust toward the change of the number of agents though inheriting the compressed representation of MFRL's action-state space. Firstly, we model the causality behind the decision-making process of MFRL into a structural causal model (SCM). Then the essential degree of each interaction is quantified via intervening on the SCM. Furthermore, we design the causality-aware compact representation for behavioral information of agents as the weighted sum of all behavioral information according to their causal effects. We test CMFQ in a mixed cooperative-competitive game and a cooperative game. The result shows that our method has excellent scalability performance in both training in environments containing a large number of agents and testing in environments containing much more agents.
