Table of Contents
Fetching ...

Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning

Shunyu Liu, Jie Song, Yihe Zhou, Na Yu, Kaixuan Chen, Zunlei Feng, Mingli Song

TL;DR

Cooperative MARL suffers from entangled inter-entity interactions that cause overfitting and poor generalization. The paper introduces OPT, a Transformer-inspired module that disentangles interactions into $N$ sparse prototypes via $P_n = \operatorname{sparsemax}(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_x}})$ and reconstructs a compact pattern $Y = \sum_{n=1}^N \omega_n P_n \mathbf{V}_n$ using a learnable aggregator; training is stabilized with a mutual information objective $I(\boldsymbol{\omega}^a_t; \tau^a_{t-1} | o^a_t)$ approximated by a variational posterior $q_\psi$. Key contributions include explicit interaction pattern disentangling with diverse, sparse prototypes via a contrastive disagreement loss, a reconstruction mechanism that selectively emphasizes salient prototypes, and empirical gains on StarCraft II, Google Research Football, and Predator-Prey in single-task, multi-task, and zero-shot settings. The work advances generalization and interpretability in MARL by making latent interaction patterns explicit and reusable across tasks, with potential for scalable extension to larger agent populations.

Abstract

Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on single-task, multi-task and zero-shot benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/OPT.

Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning

TL;DR

Cooperative MARL suffers from entangled inter-entity interactions that cause overfitting and poor generalization. The paper introduces OPT, a Transformer-inspired module that disentangles interactions into sparse prototypes via and reconstructs a compact pattern using a learnable aggregator; training is stabilized with a mutual information objective approximated by a variational posterior . Key contributions include explicit interaction pattern disentangling with diverse, sparse prototypes via a contrastive disagreement loss, a reconstruction mechanism that selectively emphasizes salient prototypes, and empirical gains on StarCraft II, Google Research Football, and Predator-Prey in single-task, multi-task, and zero-shot settings. The work advances generalization and interpretability in MARL by making latent interaction patterns explicit and reusable across tasks, with potential for scalable extension to larger agent populations.

Abstract

Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on single-task, multi-task and zero-shot benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/OPT.
Paper Structure (20 sections, 16 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 16 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: A visualization example of two shooting tasks with different scales. Green and red indicate the agents and enemies, respectively. The intertwined entity interactions can be disentangled into two interaction prototypes by filtering the noisy interactions: (1) One agent kites most of the enemies. (2) The other agents besiege the alone enemy.
  • Figure 2: An overview of the framework based on the proposed interactiOn Pattern disenTangling (OPT) method. The middle is the basic MARL framework under the CTDE paradigm, where we use the OPT module in both the utility network and mixing network. The left is the utility network of each agent, and the right is the mixing network.
  • Figure 3: An illustrative diagram of the proposed interactiOn Pattern disenTangling (OPT) method, which mainly consists of two steps: disentangling and restructuring.
  • Figure 4: Learning curves of our method and value-based baselines in 6 single-task SMAC scenarios. All experimental results are illustrated with the mean and the standard deviation of the performance over five random seeds for a fair comparison. To make the results in figures clearer for readers, we adopt a 50% confidence interval to plot the error region.
  • Figure 5: Learning curves of our method and policy-based baselines in 6 single-task SMAC scenarios. We also provide the additional results of value-based QMIX and QPLEX using 20M training timesteps to ensure comparability.
  • ...and 5 more figures