Table of Contents
Fetching ...

Enhancing Heterogeneous Multi-Agent Cooperation in Decentralized MARL via GNN-driven Intrinsic Rewards

Jahir Sadik Monon, Deeparghya Dutta Barua, Md. Mosaddek Khan

TL;DR

The CoHet algorithm is proposed, which utilizes a novel Graph Neural Network (GNN) based intrinsic motivation to facilitate the learning of heterogeneous agent policies in decentralized settings, under the challenges of partial observability and reward sparsity.

Abstract

Multi-agent Reinforcement Learning (MARL) is emerging as a key framework for various sequential decision-making and control tasks. Unlike their single-agent counterparts, multi-agent systems necessitate successful cooperation among the agents. The deployment of these systems in real-world scenarios often requires decentralized training, a diverse set of agents, and learning from infrequent environmental reward signals. These challenges become more pronounced under partial observability and the lack of prior knowledge about agent heterogeneity. While notable studies use intrinsic motivation (IM) to address reward sparsity or cooperation in decentralized settings, those dealing with heterogeneity typically assume centralized training, parameter sharing, and agent indexing. To overcome these limitations, we propose the CoHet algorithm, which utilizes a novel Graph Neural Network (GNN) based intrinsic motivation to facilitate the learning of heterogeneous agent policies in decentralized settings, under the challenges of partial observability and reward sparsity. Evaluation of CoHet in the Multi-agent Particle Environment (MPE) and Vectorized Multi-Agent Simulator (VMAS) benchmarks demonstrates superior performance compared to the state-of-the-art in a range of cooperative multi-agent scenarios. Our research is supplemented by an analysis of the impact of the agent dynamics model on the intrinsic motivation module, insights into the performance of different CoHet variants, and its robustness to an increasing number of heterogeneous agents.

Enhancing Heterogeneous Multi-Agent Cooperation in Decentralized MARL via GNN-driven Intrinsic Rewards

TL;DR

The CoHet algorithm is proposed, which utilizes a novel Graph Neural Network (GNN) based intrinsic motivation to facilitate the learning of heterogeneous agent policies in decentralized settings, under the challenges of partial observability and reward sparsity.

Abstract

Multi-agent Reinforcement Learning (MARL) is emerging as a key framework for various sequential decision-making and control tasks. Unlike their single-agent counterparts, multi-agent systems necessitate successful cooperation among the agents. The deployment of these systems in real-world scenarios often requires decentralized training, a diverse set of agents, and learning from infrequent environmental reward signals. These challenges become more pronounced under partial observability and the lack of prior knowledge about agent heterogeneity. While notable studies use intrinsic motivation (IM) to address reward sparsity or cooperation in decentralized settings, those dealing with heterogeneity typically assume centralized training, parameter sharing, and agent indexing. To overcome these limitations, we propose the CoHet algorithm, which utilizes a novel Graph Neural Network (GNN) based intrinsic motivation to facilitate the learning of heterogeneous agent policies in decentralized settings, under the challenges of partial observability and reward sparsity. Evaluation of CoHet in the Multi-agent Particle Environment (MPE) and Vectorized Multi-Agent Simulator (VMAS) benchmarks demonstrates superior performance compared to the state-of-the-art in a range of cooperative multi-agent scenarios. Our research is supplemented by an analysis of the impact of the agent dynamics model on the intrinsic motivation module, insights into the performance of different CoHet variants, and its robustness to an increasing number of heterogeneous agents.
Paper Structure (14 sections, 6 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the CoHet intrinsic reward architecture: utilizing the observation predictions of neighboring agents, it augments the self-supervised intrinsic rewards with the sparse environmental rewards to elicit collaborative actions
  • Figure 2: The per-agent dynamics models in Figure \ref{['fig:dyn-model-diagram']} are used for calculating the intrinsic rewards, which are then combined with the extrinsic reward from the environment, resulting in $r_{total_i}$ for each agent $i$. This combined reward is passed to the HetGPPO policy learning module in Figure \ref{['fig:hetgppo-diagram']} for heterogeneous policy learning
  • Figure 3: Agent dynamics model training process utilizes the ground truth next observation at the next timestep to train the agents to predict next observations more accurately
  • Figure 4: Mean Episodic Rewards in VMAS and MPE cooperative multi-agent benchmarks demonstrate that in each of the scenarios, both variants of CoHet (self/team) outperform the HetGPPO baseline, and outperform IPPO in four out of six tasks
  • Figure 5: Reward architecture evaluation for two agents in the MPE Joint Passage task