Table of Contents
Fetching ...

Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning

J. K. Terry, Nathaniel Grammel, Sanghyun Son, Benjamin Black, Aakriti Agrawal

TL;DR

This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time, and formally introduces methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimum policies.

Abstract

Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term "agent indication". Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for image based observation spaces.

Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning

TL;DR

This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time, and formally introduces methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimum policies.

Abstract

Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term "agent indication". Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for image based observation spaces.

Paper Structure

This paper contains 24 sections, 3 theorems, 1 equation, 5 figures, 6 tables.

Key Result

Lemma 1

If $G = \langle \mathcal{S}, N, \{\mathcal{A}_{i}\}, P, \{R_{i}\}, \{\Omega_{i}\}, \{O_{i}\} \rangle$ is a POSG such that $\{\Omega_{i}\}_{i\in[N]}$ is disjoint (i.e., $\Omega_{i}\cap \Omega_{j} = \emptyset$ for all $i\ne j$), then any collection of policies $\{\pi_{i}\}_{i\in[N]}$ can be expressed

Figures (5)

  • Figure 1: The 5 different agent indicator methods for image based observations. We show two different agents on the left and right to understand the difference in the agent indicators for different methods.
  • Figure 2: (a)-(e) Error plots for different environment by picking the 10 best hyperparameters and agent indicator combination. (f) Heatmap showing the number of times a agent indication method happens to be one of the best method for a particular environment.
  • Figure 3: Average reward over 10 training run (with different seeds) of the 10 best hyperparameter/agent indication combinations
  • Figure 4: Images of the benchmark environments from terry2020pettingzoo.
  • Figure 5: Learning Graph

Theorems & Definitions (6)

  • Definition 1: Partially-Observable Stochastic Game
  • Lemma 1
  • proof
  • Corollary 1
  • Theorem 1
  • proof