Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks
Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
TL;DR
This work addresses the lack of standardized MARL benchmarks by systematically comparing nine MARL algorithms across 25 cooperative tasks, and by releasing EPyMARL to unify implementations and two sparse-reward environments (LBF, RWARE). It demonstrates that CTDE methods, particularly MAPPO and QMIX, often outperform independent learners, while parameter sharing boosts performance in many sparse or large-scale tasks. The study provides practical guidance on algorithm selection for different observability and reward structures and introduces open-source tooling to enable reproducible benchmarking. Collectively, the work advances reproducibility and interpretability in MARL research and offers scalable resources for future method development.
Abstract
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we provide a systematic evaluation and comparison of three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.
