BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

Matteo Bettini; Amanda Prorok; Vincent Moens

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

Matteo Bettini, Amanda Prorok, Vincent Moens

TL;DR

BenchMARL is introduced, the first MARL training library created to enable standardized benchmarking across different algorithms, models, and environments, and its design enables systematic configuration and reporting, thus allowing users to create and run complex benchmarks from simple one-line inputs.

Abstract

The field of Multi-Agent Reinforcement Learning (MARL) is currently facing a reproducibility crisis. While solutions for standardized reporting have been proposed to address the issue, we still lack a benchmarking tool that enables standardization and reproducibility, while leveraging cutting-edge Reinforcement Learning (RL) implementations. In this paper, we introduce BenchMARL, the first MARL training library created to enable standardized benchmarking across different algorithms, models, and environments. BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations while addressing the broad community of MARL PyTorch users. Its design enables systematic configuration and reporting, thus allowing users to create and run complex benchmarks from simple one-line inputs. BenchMARL is open-sourced on GitHub: https://github.com/facebookresearch/BenchMARL

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (18 sections, 5 figures, 3 tables)

This paper contains 18 sections, 5 figures, 3 tables.

Introduction
Related work
BenchMARL
Components
Experiment.
Benchmark.
Algorithms.
Tasks.
Models.
Features
Documentation, tests, engineering.
Reporting.
Configuring.
Extending.
Callbacks and checkpointing.
...and 3 more sections

Figures (5)

Figure 1: BenchMARL enables comparisons across different Multi-Agent Reinforcement Learning (MARL) algorithms, models, and tasks while focusing on standardization and reproducibility.
Figure 2: BenchMARL execution diagram. Users run benchmarks as sets of experiments, where each experiment loads its components from the respective YAML configuration files.
Figure 3: Environments in BenchMARL. This figure shows renderings from one example task for each environment. Details and references for all environments are available in \ref{['tab:tasks']}.
Figure 4: Benchmark results over VMAS tasks (Navigation, Sampling, Balance). All plots report the 95% stratified bootstrap confidence intervals over 3 random seeds for each experiment. Curves in the top report the inter-quartile mean (IQM). See gorsane2022towardsagarwal2021deep for more details on the reported metrics. Details and references for the algorithms used are available in \ref{['tab:algorithms']}.
Figure 5: The sample efficiency curves for all BenchMARL algorithms over the three VMAS tasks analyzed. We report the inter-quartile mean (IQM) with 95% stratified bootstrap confidence intervals over 3 random seeds for each experiment. Details and references for the algorithms used are available in \ref{['tab:algorithms']}.

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

TL;DR

Abstract

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)