Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning
Yaodong Yang, Jianye Hao, Ben Liao, Kun Shao, Guangyong Chen, Wulong Liu, Hongyao Tang
TL;DR
This work introduces Qatten, an attention-based framework for decomposing the global MARL Q-value into agent-wise components. It provides a theoretical expansion of $Q_{tot}$ in terms of per-agent $Q^{i}$ and implements a practical multi-head attention mixer that respects the IGM property for tractable decentralized optimization. Empirical results on StarCraft II SMAC show Qatten achieving state-of-the-art or competitive performance across easy, hard, and super-hard scenarios, with attention analyses offering interpretability into per-agent contributions. The approach advances both the theoretical understanding of global-to-local Q-value coupling and the practical capability to coordinate large multiagent systems studied in challenging benchmarks.
Abstract
In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value $Q_{tot}$ into individual Q-values $Q^{i}$ to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between $Q_{tot}$ and $Q^{i}$ and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual $Q^{i}$s into $Q_{tot}$. In this paper, we theoretically derive a general formula of $Q_{tot}$ in terms of $Q^{i}$, based on which we can naturally implement a multi-head attention formation to approximate $Q_{tot}$, resulting in not only a refined representation of $Q_{tot}$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.
