Table of Contents
Fetching ...

Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning

Xinran Li, Ling Pan, Jun Zhang

TL;DR

MARL faces a trade-off between sample efficiency from parameter sharing and policy diversity from heterogeneous networks. Kaleidoscope resolves this by learning per-agent masks on a common parameter set using STR, augmented with a diversity regularization term and a periodic reset to preserve capacity, and it extends to critic ensembles in actor-critic setups. Empirically, Kaleidoscope improves performance over full- and partial-sharing baselines across MPE, MaMuJoCo, and SMACv2, while maintaining or reducing test-time compute. This approach offers a practical, scalable mechanism to enrich multi-agent policies and value estimates, with potential applicability to offline and meta-RL as future directions.

Abstract

In multi-agent reinforcement learning (MARL), parameter sharing is commonly employed to enhance sample efficiency. However, the popular approach of full parameter sharing often leads to homogeneous policies among agents, potentially limiting the performance benefits that could be derived from policy diversity. To address this critical limitation, we introduce \emph{Kaleidoscope}, a novel adaptive partial parameter sharing scheme that fosters policy heterogeneity while still maintaining high sample efficiency. Specifically, Kaleidoscope maintains one set of common parameters alongside multiple sets of distinct, learnable masks for different agents, dictating the sharing of parameters. It promotes diversity among policy networks by encouraging discrepancy among these masks, without sacrificing the efficiencies of parameter sharing. This design allows Kaleidoscope to dynamically balance high sample efficiency with a broad policy representational capacity, effectively bridging the gap between full parameter sharing and non-parameter sharing across various environments. We further extend Kaleidoscope to critic ensembles in the context of actor-critic algorithms, which could help improve value estimations.Our empirical evaluations across extensive environments, including multi-agent particle environment, multi-agent MuJoCo and StarCraft multi-agent challenge v2, demonstrate the superior performance of Kaleidoscope compared with existing parameter sharing approaches, showcasing its potential for performance enhancement in MARL. The code is publicly available at \url{https://github.com/LXXXXR/Kaleidoscope}.

Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning

TL;DR

MARL faces a trade-off between sample efficiency from parameter sharing and policy diversity from heterogeneous networks. Kaleidoscope resolves this by learning per-agent masks on a common parameter set using STR, augmented with a diversity regularization term and a periodic reset to preserve capacity, and it extends to critic ensembles in actor-critic setups. Empirically, Kaleidoscope improves performance over full- and partial-sharing baselines across MPE, MaMuJoCo, and SMACv2, while maintaining or reducing test-time compute. This approach offers a practical, scalable mechanism to enrich multi-agent policies and value estimates, with potential applicability to offline and meta-RL as future directions.

Abstract

In multi-agent reinforcement learning (MARL), parameter sharing is commonly employed to enhance sample efficiency. However, the popular approach of full parameter sharing often leads to homogeneous policies among agents, potentially limiting the performance benefits that could be derived from policy diversity. To address this critical limitation, we introduce \emph{Kaleidoscope}, a novel adaptive partial parameter sharing scheme that fosters policy heterogeneity while still maintaining high sample efficiency. Specifically, Kaleidoscope maintains one set of common parameters alongside multiple sets of distinct, learnable masks for different agents, dictating the sharing of parameters. It promotes diversity among policy networks by encouraging discrepancy among these masks, without sacrificing the efficiencies of parameter sharing. This design allows Kaleidoscope to dynamically balance high sample efficiency with a broad policy representational capacity, effectively bridging the gap between full parameter sharing and non-parameter sharing across various environments. We further extend Kaleidoscope to critic ensembles in the context of actor-critic algorithms, which could help improve value estimations.Our empirical evaluations across extensive environments, including multi-agent particle environment, multi-agent MuJoCo and StarCraft multi-agent challenge v2, demonstrate the superior performance of Kaleidoscope compared with existing parameter sharing approaches, showcasing its potential for performance enhancement in MARL. The code is publicly available at \url{https://github.com/LXXXXR/Kaleidoscope}.

Paper Structure

This paper contains 46 sections, 23 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Full parameter sharing confines the policies to be homogeneous. In this example, all predators pursue the same prey, neglecting another prey in the game World. Further game details are in \ref{['supp: env_details']}.
  • Figure 2: Overall network architecture of Kaleidoscope. It maintains one set of parameters $\theta_0$ with $N$ sets of masks $\left[\bm{M}_i\right]^N_{i=1}$ for actor networks, and one set of parameters $\phi_0$ with $K$ sets of masks $\left[\bm{M}_j^c\right]^K_{j=1}$ for critic ensemble networks, where $N$ is the number of agents, $K$ is the number of ensembles, and $\odot$ denotes the Hadamard product.
  • Figure 3: Illustration on resetting mechanisms.
  • Figure 4: Performance comparison with baselines on MPE and MaMuJoCo benchmarks.
  • Figure 5: Performance comparison with baselines on SMACv2 benchmarks.
  • ...and 5 more figures