Table of Contents
Fetching ...

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

Yaodong Yang, Jianye Hao, Ben Liao, Kun Shao, Guangyong Chen, Wulong Liu, Hongyao Tang

TL;DR

This work introduces Qatten, an attention-based framework for decomposing the global MARL Q-value into agent-wise components. It provides a theoretical expansion of $Q_{tot}$ in terms of per-agent $Q^{i}$ and implements a practical multi-head attention mixer that respects the IGM property for tractable decentralized optimization. Empirical results on StarCraft II SMAC show Qatten achieving state-of-the-art or competitive performance across easy, hard, and super-hard scenarios, with attention analyses offering interpretability into per-agent contributions. The approach advances both the theoretical understanding of global-to-local Q-value coupling and the practical capability to coordinate large multiagent systems studied in challenging benchmarks.

Abstract

In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value $Q_{tot}$ into individual Q-values $Q^{i}$ to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between $Q_{tot}$ and $Q^{i}$ and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual $Q^{i}$s into $Q_{tot}$. In this paper, we theoretically derive a general formula of $Q_{tot}$ in terms of $Q^{i}$, based on which we can naturally implement a multi-head attention formation to approximate $Q_{tot}$, resulting in not only a refined representation of $Q_{tot}$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

TL;DR

This work introduces Qatten, an attention-based framework for decomposing the global MARL Q-value into agent-wise components. It provides a theoretical expansion of in terms of per-agent and implements a practical multi-head attention mixer that respects the IGM property for tractable decentralized optimization. Empirical results on StarCraft II SMAC show Qatten achieving state-of-the-art or competitive performance across easy, hard, and super-hard scenarios, with attention analyses offering interpretability into per-agent contributions. The approach advances both the theoretical understanding of global-to-local Q-value coupling and the practical capability to coordinate large multiagent systems studied in challenging benchmarks.

Abstract

In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value into individual Q-values to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between and and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual s into . In this paper, we theoretically derive a general formula of in terms of , based on which we can naturally implement a multi-head attention formation to approximate , resulting in not only a refined representation of with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.

Paper Structure

This paper contains 24 sections, 1 theorem, 27 equations, 5 figures, 4 tables.

Key Result

Theorem 1

Assume that the action space is continuous and there is no independent agent. Then there exist constants $c(s),\lambda_i(s)$ (depending on state $s$), such that the local expansion of $Q_{tot}$ admits the following form where $\lambda_{i,h}$ is a linear functional of all partial derivatives $\frac{\partial^{h}Q_{tot}}{\partial Q^{i_1}...\partial Q^{i_h}}$ of order $h$, and decays super-exponentia

Figures (5)

  • Figure 1: The overall architecture of Qatten. The right is agent $i$'s recurrent deep Q-network, which receives the action-observation history record $\tau^{i}$ (last hidden states $h_{t-1}^{i}$, current local observations $o_{t}^{i}$ and last action $a_{t-1}^{i}$). The left is the mixing network of Qatten, which mixes $\vec{Q^i}(\tau^{i}_{t}, a^{i}_{t})$ together with $s_t$ and $\vec{u^{i}_{t}}$. In general, $s$ is the global state and $u^{i}$ is the agent $i$'s individual features like its position.
  • Figure 2: Median win percentage on the hard scenarios (a-d) and super hard scenarios (e-f).
  • Figure 3: Ablation study of Qatten on three difficult scenarios.
  • Figure 4: Attention weights on 5m_vs_6m and 3s5z_vs_3s6z. Steps increase from top to the bottom in the attention heat maps. Horizontal ordination indicates the agent id under each head.
  • Figure 5: Attention weights on 5m_vs_6m and 3s5z_vs_3s6z. Steps increase from top to the bottom in the attention heat maps. Horizontal ordination indicates the agent id under each head.

Theorems & Definitions (2)

  • Theorem 1
  • proof