Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, Jakob Foerster
TL;DR
Craftax tackles the bottleneck of slow, open-ended reinforcement learning benchmarks by delivering a fast, GPU-accelerated, JAX-based environment that preserves open-ended dynamics. By reimplementing Crafter as Craftax-Classic and adding NetHack-inspired mechanics to form Craftax, the authors demonstrate substantial speedups and provide two symbolic-observation benchmarks (Craftax-1B and Craftax-1M) to study long-horizon exploration under moderate compute. Through extensive experiments with PPO-based baselines, intrinsic rewards, and UED methods, the work reveals that current approaches struggle on Craftax, underscoring the benchmark's difficulty and utility for investigating exploration, continual learning, and generalization. Overall, Craftax offers a practical, scalable platform for advancing open-ended RL research, enabling rapid iteration and deeper evaluation of long-horizon, adaptive decision-making under real-world compute constraints.
Abstract
Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.
