AgentMixer: Multi-Agent Correlated Policy Factorization
Zhiyuan Li, Wenshuai Zhao, Lijun Wu, Joni Pajarinen
TL;DR
AgentMixer tackles coordination in cooperative MARL by enabling correlated policies under partial observability. It introduces Policy Modifier to construct a correlated joint policy and Individual-Global-Consistency to align modes between joint and individual policies, enabling decentralized execution. The authors prove convergence to an $\epsilon$-approximate Correlated Equilibrium and validate the approach on MA-MuJoCo, SMAC-v2, Matrix Game, and Predator-Prey, where it matches or surpasses state-of-the-art methods. This work advances coordination in CTDE settings and offers a practical framework for scalable multi-agent systems.
Abstract
In multi-agent reinforcement learning, centralized training with decentralized execution (CTDE) methods typically assume that agents make decisions based on their local observations independently, which may not lead to a correlated joint policy with coordination. Coordination can be explicitly encouraged during training and individual policies can be trained to imitate the correlated joint policy. However, this may lead to an \textit{asymmetric learning failure} due to the observation mismatch between the joint and individual policies. Inspired by the concept of correlated equilibrium, we introduce a \textit{strategy modification} called AgentMixer that allows agents to correlate their policies. AgentMixer combines individual partially observable policies into a joint fully observable policy non-linearly. To enable decentralized execution, we introduce \textit{Individual-Global-Consistency} to guarantee mode consistency during joint training of the centralized and decentralized policies and prove that AgentMixer converges to an $ε$-approximate Correlated Equilibrium. In the Multi-Agent MuJoCo, SMAC-v2, Matrix Game, and Predator-Prey benchmarks, AgentMixer outperforms or matches state-of-the-art methods.
