Table of Contents
Fetching ...

Inductive Bias for Emergent Communication in a Continuous Setting

John Isak Fjellvang Villanger, Troels Arnfred Bojesen

TL;DR

It is demonstrated that this type of inductive bias has a beneficial effect on the communication protocols learnt in two toy environments, Negotiation and Sequence Guess.

Abstract

We study emergent communication in a multi-agent reinforcement learning setting, where the agents solve cooperative tasks and have access to a communication channel. The communication channel may consist of either discrete symbols or continuous variables. We introduce an inductive bias to aid with the emergence of good communication protocols for continuous messages, and we look at the effect this type of inductive bias has for continuous and discrete messages in itself or when used in combination with reinforcement learning. We demonstrate that this type of inductive bias has a beneficial effect on the communication protocols learnt in two toy environments, Negotiation and Sequence Guess.

Inductive Bias for Emergent Communication in a Continuous Setting

TL;DR

It is demonstrated that this type of inductive bias has a beneficial effect on the communication protocols learnt in two toy environments, Negotiation and Sequence Guess.

Abstract

We study emergent communication in a multi-agent reinforcement learning setting, where the agents solve cooperative tasks and have access to a communication channel. The communication channel may consist of either discrete symbols or continuous variables. We introduce an inductive bias to aid with the emergence of good communication protocols for continuous messages, and we look at the effect this type of inductive bias has for continuous and discrete messages in itself or when used in combination with reinforcement learning. We demonstrate that this type of inductive bias has a beneficial effect on the communication protocols learnt in two toy environments, Negotiation and Sequence Guess.
Paper Structure (18 sections, 12 equations, 5 figures, 3 tables)

This paper contains 18 sections, 12 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Example of continuous positive signaling where messages contain two components. The distance $d_{i,j} = d(\boldsymbol{m}_i, \boldsymbol{m}_j)$ is calculated between every message. The circles indicate where $d_{i,j} = \lambda_2/\lambda_1$, in other words where message $\boldsymbol{m}_i$ is considered too close to $\boldsymbol{m}_j$. $\boldsymbol{m}_3$ and $\boldsymbol{m}_4$ show a case where $d_{3,4} < \lambda_2/\lambda_1$. $\boldsymbol{m_5}$ illustrates that the space wraps around.
  • Figure 2: Negotiation. An example run of two agents negotiating over three different types of beverages. Messages have no predefined meaning. The hidden utilities indicate how each beverage is weighted when calculating the reward. A proposal of $[0.9, 0.3, 0.5]$ from agent $A$ would mean agent $A$ receives these proposed fractions of each beverage (here: soda, water and orange juice), while agent $B$ receives the remainder. Agent $B$ can either accept this proposal or come with a counter-proposal. In this example, where agent $A$ accepts $B$'s counter-proposal, the negotiation ends. The agents individual raw reward will be $0.5 \cdot 0.8 + 0.7 \cdot 0.35 + 0.6 \cdot 0.5 \approx 0.95$ and $0.5 \cdot 0.4 + 0.3 \cdot 0.2 + 0.4 \cdot 0.8 \approx 0.58$, which leads to a shared reward of $r_\text{Neg} \approx (0.95 + 0.58)/(0.8 + 0.35 + 0.8) \approx 0.78$. The robots have been taken from Robot1Robot2. The beverages have been taken from beverage_1beverage_2beverage_3.
  • Figure 3: An excerpt of Sequence Guess. The guesser attempts to guess some target sequence, while the mastermind tries to provide information about the target sequence to the guesser. Here, the alphabet size is 3 and the target sequence length is 3, while messages consist of one real number. The robot figures are from Robot1Robot2.
  • Figure 4: Summary of 30 independent experiment runs for each game and loss function combination investigated. Each sample is an average of a mini-batch of size 2048. The lines indicate the means, while the bands denote $95\%$ confidence intervals. The rows display the different games, continuous message (CM) Sequence Guess, discrete message (DM) Sequence Guess, and Negotiation, with the columns are organized with respect to whether interagent gradients are allowed to flow or not. The returns have been scaled to ensure that their maximal expectation values are one. See the text for more details.
  • Figure 5: The encoder-decoder architecture used for DM Sequence Guess. In the case of the mastermind input $x_t$ will contain symbol number $t$ from the guess sequence and target sequence, $T_1$ will be the length of the target sequence and $T_2$ will be the length of the message sequence. The output $y_t$ is used in a fully connected layer with a Softmax activation function in order to find message symbol number $t$. In the case of the guesser input $x_t$ will contain symbol number $t$ from the message, $T_1$ will be the length of the message sequence and $T_2$ will be the length of the target sequence. $y_t$ is used in a fully connected layer with a Softmax activation function in order to find guess symbol number $t$. In both cases a one-hot encoding of the current turn is appended to the final hidden state of the Encoder in order to produce the context vector.