Table of Contents
Fetching ...

Fully Independent Communication in Multi-Agent Reinforcement Learning

Rafael Pina, Varuna De Silva, Corentin Artaud, Xiaolan Liu

TL;DR

This paper investigates how independent learners in MARL that do not share parameters can communicate, and demonstrates that this setting might incur into some problems, to which a new learning scheme is proposed as a solution.

Abstract

Multi-Agent Reinforcement Learning (MARL) comprises a broad area of research within the field of multi-agent systems. Several recent works have focused specifically on the study of communication approaches in MARL. While multiple communication methods have been proposed, these might still be too complex and not easily transferable to more practical contexts. One of the reasons for that is due to the use of the famous parameter sharing trick. In this paper, we investigate how independent learners in MARL that do not share parameters can communicate. We demonstrate that this setting might incur into some problems, to which we propose a new learning scheme as a solution. Our results show that, despite the challenges, independent agents can still learn communication strategies following our method. Additionally, we use this method to investigate how communication in MARL is affected by different network capacities, both for sharing and not sharing parameters. We observe that communication may not always be needed and that the chosen agent network sizes need to be considered when used together with communication in order to achieve efficient learning.

Fully Independent Communication in Multi-Agent Reinforcement Learning

TL;DR

This paper investigates how independent learners in MARL that do not share parameters can communicate, and demonstrates that this setting might incur into some problems, to which a new learning scheme is proposed as a solution.

Abstract

Multi-Agent Reinforcement Learning (MARL) comprises a broad area of research within the field of multi-agent systems. Several recent works have focused specifically on the study of communication approaches in MARL. While multiple communication methods have been proposed, these might still be too complex and not easily transferable to more practical contexts. One of the reasons for that is due to the use of the famous parameter sharing trick. In this paper, we investigate how independent learners in MARL that do not share parameters can communicate. We demonstrate that this setting might incur into some problems, to which we propose a new learning scheme as a solution. Our results show that, despite the challenges, independent agents can still learn communication strategies following our method. Additionally, we use this method to investigate how communication in MARL is affected by different network capacities, both for sharing and not sharing parameters. We observe that communication may not always be needed and that the chosen agent network sizes need to be considered when used together with communication in order to achieve efficient learning.
Paper Structure (18 sections, 21 equations, 8 figures, 1 algorithm)

This paper contains 18 sections, 21 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Simple overview of the configurations of sharing (left) and not sharing (right) parameters. In the former, a joint action is produced by policies that are part of the same network (shared by all the agents), while in the latter each agent has its own separate policy.
  • Figure 2: Illustration of the main differences in the process of generating and broadcasting messages between sharing and not sharing parameters of the learning networks. In the first case, both the policies and communication networks are controlled by the same parameters $\theta$ and $\mu$, while in the second case, these have distinct parameters $\theta_i$ and $\mu_i$.
  • Figure 3: Illustration of how our proposed scheme for independent communication without parameter sharing (NPS+IQL+COMM) works when compared to sharing parameters (PS). The figure shows that agents that do not share parameters also need to receive their own message as input to keep the link to the computational graph of their communication network during backpropagation. On the other hand, when parameters are shared this trick is not needed since all of them use the same network and there are no gradient propagation problems by losing the links to the communication networks in the computation graph.
  • Figure 4: Environments used in the experiments. On the left, 3s_vs_5z, a scenario from the SMAC collection smac_2019, and on the right a PredatorPrey game, where 4 predators must catch two moving prey magym.
  • Figure 5: Win rates achieved by the attempted methods in 3s_vs_5z. The dashed line (optimum) represents the optimal value that could be achieved by the agents, i.e., a win rate of 1. For completeness, we include, in the supplementary, the corresponding rewards of these win rates.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2