stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation
Lucas Maes, Quentin Le Lidec, Dan Haramati, Nassim Massaudi, Damien Scieur, Yann LeCun, Randall Balestriero
TL;DR
The paper addresses fragmentation and inconsistent benchmarks in world-model research by introducing Stable World Model (SWM), a modular ecosystem with a World interface, a diversified environment suite with controllable factors of variation, and a standardized evaluation framework. It demonstrates SWM's utility by reproducing and rigorously evaluating DINO-WM under both in-distribution and out-of-distribution conditions, revealing limited zero-shot robustness to environmental variations. The approach enables reproducible experimentation, controlled robustness analysis, and standardized benchmarking, which could accelerate progress in controllable world models. The authors also outline future directions, including enhanced debugging/interpretation tools, additional environments, and a community benchmark via platforms like Hugging Face.
Abstract
World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct experience. Despite recent interest in World Models, most available implementations remain publication-specific, severely limiting their reusability, increasing the risk of bugs, and reducing evaluation standardization. To mitigate these issues, we introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem that provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations. In addition, each environment in SWM enables controllable factors of variation, including visual and physical properties, to support robustness and continual learning research. Finally, we demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.
