Table of Contents
Fetching ...

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht

TL;DR

This work addresses the lack of standardized MARL benchmarks by systematically comparing nine MARL algorithms across 25 cooperative tasks, and by releasing EPyMARL to unify implementations and two sparse-reward environments (LBF, RWARE). It demonstrates that CTDE methods, particularly MAPPO and QMIX, often outperform independent learners, while parameter sharing boosts performance in many sparse or large-scale tasks. The study provides practical guidance on algorithm selection for different observability and reward structures and introduces open-source tooling to enable reproducible benchmarking. Collectively, the work advances reproducibility and interpretability in MARL research and offers scalable resources for future method development.

Abstract

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we provide a systematic evaluation and comparison of three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

TL;DR

This work addresses the lack of standardized MARL benchmarks by systematically comparing nine MARL algorithms across 25 cooperative tasks, and by releasing EPyMARL to unify implementations and two sparse-reward environments (LBF, RWARE). It demonstrates that CTDE methods, particularly MAPPO and QMIX, often outperform independent learners, while parameter sharing boosts performance in many sparse or large-scale tasks. The study provides practical guidance on algorithm selection for different observability and reward structures and introduces open-source tooling to enable reproducible benchmarking. Collectively, the work advances reproducibility and interpretability in MARL research and offers scalable resources for future method development.

Abstract

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we provide a systematic evaluation and comparison of three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.

Paper Structure

This paper contains 68 sections, 4 equations, 11 figures, 29 tables.

Figures (11)

  • Figure 1: Illustrations of the open-sourced multi-agent environments christianos2020shared.
  • Figure 2: Normalised evaluation returns averaged over the tasks in the all environments except matrix games. Shadowed part represents the 95% confidence interval.
  • Figure 3: Normalised maximum returns averaged over all algorithms with/without parameter sharing (with standard error).
  • Figure 4: Environment renderings matching observations for (a) Level-Based Foraging and (b) Multi-Robot Warehouse.
  • Figure 5: Mean simulation time per step for all environments. Bars indicate standard deviations of simulation speed across all tasks within the environments.
  • ...and 6 more figures