Table of Contents
Fetching ...

Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

Usman Anwar, Ashish Pandian, Jia Wan, David Krueger, Jakob Foerster

TL;DR

It is shown that with NZSC training, RL agents can be trained to coordinate well with novel partners even when the (exact) problem setting of the coordination is not common knowledge.

Abstract

Zero-shot coordination (ZSC) is a popular setting for studying the ability of reinforcement learning (RL) agents to coordinate with novel partners. Prior ZSC formulations assume the $\textit{problem setting}$ is common knowledge: each agent knows the underlying Dec-POMDP, knows others have this knowledge, and so on ad infinitum. However, this assumption rarely holds in complex real-world settings, which are often difficult to fully and correctly specify. Hence, in settings where this common knowledge assumption is invalid, agents trained using ZSC methods may not be able to coordinate well. To address this limitation, we formulate the $\textit{noisy zero-shot coordination}$ (NZSC) problem. In NZSC, agents observe different noisy versions of the ground truth Dec-POMDP, which are assumed to be distributed according to a fixed noise model. Only the distribution of ground truth Dec-POMDPs and the noise model are common knowledge. We show that a NZSC problem can be reduced to a ZSC problem by designing a meta-Dec-POMDP with an augmented state space consisting of all the ground-truth Dec-POMDPs. For solving NZSC problems, we propose a simple and flexible meta-learning method called NZSC training, in which the agents are trained across a distribution of coordination problems - which they only get to observe noisy versions of. We show that with NZSC training, RL agents can be trained to coordinate well with novel partners even when the (exact) problem setting of the coordination is not common knowledge.

Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

TL;DR

It is shown that with NZSC training, RL agents can be trained to coordinate well with novel partners even when the (exact) problem setting of the coordination is not common knowledge.

Abstract

Zero-shot coordination (ZSC) is a popular setting for studying the ability of reinforcement learning (RL) agents to coordinate with novel partners. Prior ZSC formulations assume the is common knowledge: each agent knows the underlying Dec-POMDP, knows others have this knowledge, and so on ad infinitum. However, this assumption rarely holds in complex real-world settings, which are often difficult to fully and correctly specify. Hence, in settings where this common knowledge assumption is invalid, agents trained using ZSC methods may not be able to coordinate well. To address this limitation, we formulate the (NZSC) problem. In NZSC, agents observe different noisy versions of the ground truth Dec-POMDP, which are assumed to be distributed according to a fixed noise model. Only the distribution of ground truth Dec-POMDPs and the noise model are common knowledge. We show that a NZSC problem can be reduced to a ZSC problem by designing a meta-Dec-POMDP with an augmented state space consisting of all the ground-truth Dec-POMDPs. For solving NZSC problems, we propose a simple and flexible meta-learning method called NZSC training, in which the agents are trained across a distribution of coordination problems - which they only get to observe noisy versions of. We show that with NZSC training, RL agents can be trained to coordinate well with novel partners even when the (exact) problem setting of the coordination is not common knowledge.

Paper Structure

This paper contains 29 sections, 1 theorem, 4 equations, 14 figures, 1 table.

Key Result

Lemma 1

The following is a symmetry in the noisy lever game where $\phi_s, \phi_a, \phi_o$ are permutation maps of the state space, the action space, and the observation space respectively.

Figures (14)

  • Figure 1: In zero-shot coordination, the environment $E^*$ is assumed to be common knowledge (CK). In noisy zero-shot coordination, agents still have to act and coordinate in $E^*$ but $E^*$ is no longer CK. Instead, each agent has a distinct (private) model of the problem setting which is assumed to be a noisy copy of $E^*$.
  • Figure 2: In the noisy lever game, depicted here, there are three levers corresponding to three different reward values. Agents get the reward if they both pull the same lever. (a) shows the ground truth game $E^*$. (b) shows $E_A$, the noisy version of $E^*$ observed by player A. Similarly, (c) shows $E_B$, the noisy version of $E^*$ observed by player B.
  • Figure 3: Visualization of Coordinated Exploration Environment (CEE). The agent start position is denoted by $\rightarrow$, while gold chests denote different mines. The mine in the bottom left gives reward of $1$ when mined and is always located in the same square. While for the other three mines, the reward values are sampled randomly from normal distribution with means $20$, $10$ and $10$ and the location of these mines is randomized in every episode. The bottom right corner contains the key which if collected allows agents to $3$x any future rewards collected. The agents observe 3x3 gird around them.
  • Figure 4: A depiction of SyncSght Environment (SSE). Black line in the middle is the impermissible barrier that divides the grid into two symmetric subgrids. The agent A1 is shown to have view size of $2$, while the agent A2 is shown to have view size 5. The numbers at the top of each column correspond to the mean of the normal distribution from which the reward values for the squares in that column are sampled.
  • Figure 5: Cross-play return for agents trained via Self-Play.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Lemma 1