Table of Contents
Fetching ...

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, Jakob Foerster

TL;DR

Craftax tackles the bottleneck of slow, open-ended reinforcement learning benchmarks by delivering a fast, GPU-accelerated, JAX-based environment that preserves open-ended dynamics. By reimplementing Crafter as Craftax-Classic and adding NetHack-inspired mechanics to form Craftax, the authors demonstrate substantial speedups and provide two symbolic-observation benchmarks (Craftax-1B and Craftax-1M) to study long-horizon exploration under moderate compute. Through extensive experiments with PPO-based baselines, intrinsic rewards, and UED methods, the work reveals that current approaches struggle on Craftax, underscoring the benchmark's difficulty and utility for investigating exploration, continual learning, and generalization. Overall, Craftax offers a practical, scalable platform for advancing open-ended RL research, enabling rapid iteration and deeper evaluation of long-horizon, adaptive decision-making under real-world compute constraints.

Abstract

Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

TL;DR

Craftax tackles the bottleneck of slow, open-ended reinforcement learning benchmarks by delivering a fast, GPU-accelerated, JAX-based environment that preserves open-ended dynamics. By reimplementing Crafter as Craftax-Classic and adding NetHack-inspired mechanics to form Craftax, the authors demonstrate substantial speedups and provide two symbolic-observation benchmarks (Craftax-1B and Craftax-1M) to study long-horizon exploration under moderate compute. Through extensive experiments with PPO-based baselines, intrinsic rewards, and UED methods, the work reveals that current approaches struggle on Craftax, underscoring the benchmark's difficulty and utility for investigating exploration, continual learning, and generalization. Overall, Craftax offers a practical, scalable platform for advancing open-ended RL research, enabling rapid iteration and deeper evaluation of long-horizon, adaptive decision-making under real-world compute constraints.

Abstract

Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.
Paper Structure (49 sections, 27 figures, 14 tables)

This paper contains 49 sections, 27 figures, 14 tables.

Figures (27)

  • Figure 1: Pixel-based view from Craftax.
  • Figure 2: Speed comparison with popular benchmarks for open-ended learning. Craftax-Classic and Craftax are 257x and 169x faster than Crafter respectively. Details of the speed test are in Appendix \ref{['app:speed_comparison']} and best case results are in Table \ref{['tab:speed_comparison']}.
  • Figure 3: Rewards on Craftax-1B for PPO, PPO-RNN, ICM, E3B and RND. Each algorithm is run for 1 billion timesteps with 10 seeds. The shaded area denotes 1 standard error.
  • Figure 4: Achievement success rate on Craftax-1B split by achievement difficulty. Each algorithm is run on 10 seeds, with error bars showing 1 standard error.
  • Figure 5: Achievement success rate on Craftax-1B for selected achievements over 10 seeds, with error bars denoting 1 standard error.
  • ...and 22 more figures