Table of Contents
Fetching ...

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang

TL;DR

The paper tackles homogeneous behaviors and evolving role dynamics in cooperative MARL under CTDE. It introduces ACORM, a framework that learns discriminative role representations through mutual-information–based contrastive learning and integrates them into value decomposition via an attention mechanism for expressive credit assignment. Empirical results on StarCraft II SMAC and Google Research Football show state-of-the-art performance and robust coordination, with ablations confirming the distinct contributions of contrastive role representations and attention-guided coordination. Visualizations further reveal meaningful role emergence and attention patterns that align with strategic team coordination, suggesting strong practical impact for complex multi-agent systems.

Abstract

Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of **A**ttention-guided **CO**ntrastive **R**ole representation learning for **M**ARL (**ACORM**) to promote behavior heterogeneity, knowledge transfer, and skillful coordination across agents. First, we introduce mutual information maximization to formalize role representation learning, derive a contrastive learning objective, and concisely approximate the distribution of negative pairs. Second, we leverage an attention mechanism to prompt the global state to attend to learned role representations in value decomposition, implicitly guiding agent coordination in a skillful role space to yield more expressive credit assignment. Experiments on challenging StarCraft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. Our code is available at [https://github.com/NJU-RL/ACORM](https://github.com/NJU-RL/ACORM).

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

TL;DR

The paper tackles homogeneous behaviors and evolving role dynamics in cooperative MARL under CTDE. It introduces ACORM, a framework that learns discriminative role representations through mutual-information–based contrastive learning and integrates them into value decomposition via an attention mechanism for expressive credit assignment. Empirical results on StarCraft II SMAC and Google Research Football show state-of-the-art performance and robust coordination, with ablations confirming the distinct contributions of contrastive role representations and attention-guided coordination. Visualizations further reveal meaningful role emergence and attention patterns that align with strategic team coordination, suggesting strong practical impact for complex multi-agent systems.

Abstract

Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of **A**ttention-guided **CO**ntrastive **R**ole representation learning for **M**ARL (**ACORM**) to promote behavior heterogeneity, knowledge transfer, and skillful coordination across agents. First, we introduce mutual information maximization to formalize role representation learning, derive a contrastive learning objective, and concisely approximate the distribution of negative pairs. Second, we leverage an attention mechanism to prompt the global state to attend to learned role representations in value decomposition, implicitly guiding agent coordination in a skillful role space to yield more expressive credit assignment. Experiments on challenging StarCraft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. Our code is available at [https://github.com/NJU-RL/ACORM](https://github.com/NJU-RL/ACORM).
Paper Structure (10 sections, 3 theorems, 12 equations, 13 figures, 4 tables, 1 algorithm)

This paper contains 10 sections, 3 theorems, 12 equations, 13 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{M}$ denote a set of roles following the role distribution $P(M)$, and $|\mathcal{M}|\!=\!K$. $M\!\in\! \mathcal{M}$ is a given role. Let $e\!=\!f_{\phi}(\sum_t (o^t,a^{t-1}))$, $z\!\sim\! f_{\theta}(z|e)$, and $h(e,z)\!=\!\frac{p(z|e)}{p(z)}$, where $\sum_t(o^t,a^{t-1})$ is the agent's

Figures (13)

  • Figure 1: The ACORM framework based on QMIX. (a) The overall architecture. (b) The structure of shared individual Q-network. (c) The detail of contrastive role representation learning, where $z_i$ is the query $q$, and $z_{i'}/z_{i^*}$ are positive/negative keys $k_+/k_-$. (d) The attention module that incorporates learned role representations into the mixing network's input for better value decomposition.
  • Figure 2: Performance comparison between ACORM and baselines on six representative maps.
  • Figure 3: Ablation studies. ACORM_w/o_CL removes contrastive learning, ACORM_w/o_MHA removes attention, and ACORM_w/o_MHA (Vanilla) removes attention and state encoding.
  • Figure 4: Example rendering scenes at three time steps in an evaluation trajectory generated by the trained ACORM policy on MMM2. The upper row shows screenshots of combat scenarios that contain the information of positions, health points, shield points, states of ally and enemy units, etc. The lower row visualizes the corresponding agent embeddings (denoted with bullets '$\bullet$') and role representations (denoted with stars '$\bm{\star}$') by projecting these vectors into 2D space via t-SNE for qualitative analysis, where agents within the same cluster are depicted using the same color.
  • Figure 5: Example rendering scenes in an evaluation trajectory generated by the trained ACORM policy on MMM2. The lower row visualizes attention weights ($\bm{\alpha}$ in Eq. (\ref{['att_i']})) of all four heads that explain how the global state attends to each role to guide skillful coordination in the role space. A higher weight means a larger contribution made by the corresponding role for value decomposition.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Definition 1
  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 1
  • proof