Table of Contents
Fetching ...

Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance

Joshua McClellan, Naveed Haghani, John Winder, Furong Huang, Pratap Tokekar

TL;DR

This paper demonstrates that EGNNs improve the sample efficiency and generalization in MARL, and demonstrates a significant improvement in sample efficiency, greater final reward convergence, and a 2x-5x gain in over standard GNNs in the generalization tests.

Abstract

Multi-Agent Reinforcement Learning (MARL) struggles with sample inefficiency and poor generalization [1]. These challenges are partially due to a lack of structure or inductive bias in the neural networks typically used in learning the policy. One such form of structure that is commonly observed in multi-agent scenarios is symmetry. The field of Geometric Deep Learning has developed Equivariant Graph Neural Networks (EGNN) that are equivariant (or symmetric) to rotations, translations, and reflections of nodes. Incorporating equivariance has been shown to improve learning efficiency and decrease error [ 2 ]. In this paper, we demonstrate that EGNNs improve the sample efficiency and generalization in MARL. However, we also show that a naive application of EGNNs to MARL results in poor early exploration due to a bias in the EGNN structure. To mitigate this bias, we present Exploration-enhanced Equivariant Graph Neural Networks or E2GN2. We compare E2GN2 to other common function approximators using common MARL benchmarks MPE and SMACv2. E2GN2 demonstrates a significant improvement in sample efficiency, greater final reward convergence, and a 2x-5x gain in over standard GNNs in our generalization tests. These results pave the way for more reliable and effective solutions in complex multi-agent systems.

Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance

TL;DR

This paper demonstrates that EGNNs improve the sample efficiency and generalization in MARL, and demonstrates a significant improvement in sample efficiency, greater final reward convergence, and a 2x-5x gain in over standard GNNs in the generalization tests.

Abstract

Multi-Agent Reinforcement Learning (MARL) struggles with sample inefficiency and poor generalization [1]. These challenges are partially due to a lack of structure or inductive bias in the neural networks typically used in learning the policy. One such form of structure that is commonly observed in multi-agent scenarios is symmetry. The field of Geometric Deep Learning has developed Equivariant Graph Neural Networks (EGNN) that are equivariant (or symmetric) to rotations, translations, and reflections of nodes. Incorporating equivariance has been shown to improve learning efficiency and decrease error [ 2 ]. In this paper, we demonstrate that EGNNs improve the sample efficiency and generalization in MARL. However, we also show that a naive application of EGNNs to MARL results in poor early exploration due to a bias in the EGNN structure. To mitigate this bias, we present Exploration-enhanced Equivariant Graph Neural Networks or E2GN2. We compare E2GN2 to other common function approximators using common MARL benchmarks MPE and SMACv2. E2GN2 demonstrates a significant improvement in sample efficiency, greater final reward convergence, and a 2x-5x gain in over standard GNNs in our generalization tests. These results pave the way for more reliable and effective solutions in complex multi-agent systems.
Paper Structure (24 sections, 5 theorems, 26 equations, 12 figures, 5 tables)

This paper contains 24 sections, 5 theorems, 26 equations, 12 figures, 5 tables.

Key Result

Theorem 1

Given a layer $l$ of an EGNN with randomly initialized weights, with the equivariant component input vector ${\bm{u}}_{i}^{l} \in {\mathbb{R}}^n$, equivariant output vector ${\bm{u}}_{i}^{l} \in {\mathbb{R}}^n$ and the multilayer perceptron $\phi_{u} : {\mathbb{R}}^m \mapsto {\mathbb{R}}$, where the Furthermore, given a full EGNN with L layers then the expected value of the network output is appro

Figures (12)

  • Figure 1: An example of how using an equivariant function approximator shrinks the total search space.
  • Figure 2: An example of rotational equivariance/symmetry in MPE simple spread environment. Note as the agent (in red) positions are rotated, the optimal actions (arrows) are also rotated.
  • Figure 3: An example of biased learning in MPE simple spread environment. Left: We observed the behavior of the EGNN agents in this early training phase. Each agent moved away from the origin due to the EGNN bias. Right: Note the very low reward in early training steps due to the biased policies moving away from the goals.
  • Figure 4: An example of using an Equivariant Graph Neural Network in MARL. Note that the state must be structured as a graph. As discussed in \ref{['sec:complex_act']}, the output of the policy uses $u_i$ for equivariant (typically spatial) actions, and the $h_i$ for discrete components of the actions
  • Figure 5: Comparing PPO learning performance on MPE with various Neural Networks (TOP) reward as a function of environment steps (BOTTOM) reward as a function of wall clock time
  • ...and 7 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Corollary 1.1
  • Theorem 2
  • Corollary 2.1
  • Theorem 3