Table of Contents
Fetching ...

Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale

Bassel Al Omari, Michael Matthews, Alexander Rutherford, Jakob Nicolaus Foerster

TL;DR

This work introduces Craftax-MA, a fast, open-ended multi-agent reinforcement learning benchmark built on Craftax, and Craftax-Coop, a cooperative variant with agent specializations and trading. By operating in a hardware-accelerated, JAX-based framework, these environments enable rapid exploration of long-horizon coordination among heterogeneous agents. Empirical evaluations with MAPPO, IPPO, and PQN reveal that existing MARL methods struggle with long-horizon credit assignment and exploration, with independent learning sometimes outperforming PPO-based baselines in some settings. Overall, Craftax-MA and Craftax-Coop offer a scalable, challenging benchmark suite intended to drive progress toward more adaptable and cooperative multi-agent systems.

Abstract

Progress in multi-agent reinforcement learning (MARL) requires challenging benchmarks that assess the limits of current methods. However, existing benchmarks often target narrow short-horizon challenges that do not adequately stress the long-term dependencies and generalization capabilities inherent in many multi-agent systems. To address this, we first present \textit{Craftax-MA}: an extension of the popular open-ended RL environment, Craftax, that supports multiple agents and evaluates a wide range of general abilities within a single environment. Written in JAX, \textit{Craftax-MA} is exceptionally fast with a training run using 250 million environment interactions completing in under an hour. To provide a more compelling challenge for MARL, we also present \textit{Craftax-Coop}, an extension introducing heterogeneous agents, trading and more mechanics that require complex cooperation among agents for success. We provide analysis demonstrating that existing algorithms struggle with key challenges in this benchmark, including long-horizon credit assignment, exploration and cooperation, and argue for its potential to drive long-term research in MARL.

Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale

TL;DR

This work introduces Craftax-MA, a fast, open-ended multi-agent reinforcement learning benchmark built on Craftax, and Craftax-Coop, a cooperative variant with agent specializations and trading. By operating in a hardware-accelerated, JAX-based framework, these environments enable rapid exploration of long-horizon coordination among heterogeneous agents. Empirical evaluations with MAPPO, IPPO, and PQN reveal that existing MARL methods struggle with long-horizon credit assignment and exploration, with independent learning sometimes outperforming PPO-based baselines in some settings. Overall, Craftax-MA and Craftax-Coop offer a scalable, challenging benchmark suite intended to drive progress toward more adaptable and cooperative multi-agent systems.

Abstract

Progress in multi-agent reinforcement learning (MARL) requires challenging benchmarks that assess the limits of current methods. However, existing benchmarks often target narrow short-horizon challenges that do not adequately stress the long-term dependencies and generalization capabilities inherent in many multi-agent systems. To address this, we first present \textit{Craftax-MA}: an extension of the popular open-ended RL environment, Craftax, that supports multiple agents and evaluates a wide range of general abilities within a single environment. Written in JAX, \textit{Craftax-MA} is exceptionally fast with a training run using 250 million environment interactions completing in under an hour. To provide a more compelling challenge for MARL, we also present \textit{Craftax-Coop}, an extension introducing heterogeneous agents, trading and more mechanics that require complex cooperation among agents for success. We provide analysis demonstrating that existing algorithms struggle with key challenges in this benchmark, including long-horizon credit assignment, exploration and cooperation, and argue for its potential to drive long-term research in MARL.

Paper Structure

This paper contains 29 sections, 7 figures.

Figures (7)

  • Figure 1: Example pixel-based observation of Craftax-Coop with a summary of player specializations. We also provide a symbolic observation to focus research on multi-agent challenges.
  • Figure 2: Analysis of Craftax-MA's ability to scale to thousands of parallel environments and different agent population counts. All measurements were recorded while training IPPO on a single L40S GPU. Results are compared to training PPO on Craftax. Scaling the number of parallel environments is nearly log-log linear with training throughput, while increasing the number of agents monotonically reduces the training throughput.
  • Figure 3: Comparison of training performance of MAPPO in Craftax-MA with (a) shared rewards and (b) individual rewards, for increasing number of agents. Results are also compared with the final reward of PPO-RNN on Craftax-1B matthews2024craftax. Increasing the number of agents produces a decrease in the obtained returns, but a narrower difference in returns is observed under the individual reward setting. The experiments were repeated for 3 seeds, with the shaded area and error bars denoting 1 standard error.
  • Figure 4: Collection rate of resources after training MAPPO using 1 billion environment interactions in the individual rewards setting. As the number of agents is increased, the collection rate of resources consistently decrease. Experiments were repeated for 3 seeds, with the shaded area and error bars denoting 1 standard error.
  • Figure 5: Performance comparison of MAPPO, IPPO PQN on the Craftax-Coop environment with 3 agents. MAPPO produces the lowest final episodic returns compared to other algorithms. Each algorithm is run for 1 billion timesteps with 3 seeds. The shaded area denotes 1 standard error.
  • ...and 2 more figures