Table of Contents
Fetching ...

stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

Lucas Maes, Quentin Le Lidec, Dan Haramati, Nassim Massaudi, Damien Scieur, Yann LeCun, Randall Balestriero

TL;DR

The paper addresses fragmentation and inconsistent benchmarks in world-model research by introducing Stable World Model (SWM), a modular ecosystem with a World interface, a diversified environment suite with controllable factors of variation, and a standardized evaluation framework. It demonstrates SWM's utility by reproducing and rigorously evaluating DINO-WM under both in-distribution and out-of-distribution conditions, revealing limited zero-shot robustness to environmental variations. The approach enables reproducible experimentation, controlled robustness analysis, and standardized benchmarking, which could accelerate progress in controllable world models. The authors also outline future directions, including enhanced debugging/interpretation tools, additional environments, and a community benchmark via platforms like Hugging Face.

Abstract

World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct experience. Despite recent interest in World Models, most available implementations remain publication-specific, severely limiting their reusability, increasing the risk of bugs, and reducing evaluation standardization. To mitigate these issues, we introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem that provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations. In addition, each environment in SWM enables controllable factors of variation, including visual and physical properties, to support robustness and continual learning research. Finally, we demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.

stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

TL;DR

The paper addresses fragmentation and inconsistent benchmarks in world-model research by introducing Stable World Model (SWM), a modular ecosystem with a World interface, a diversified environment suite with controllable factors of variation, and a standardized evaluation framework. It demonstrates SWM's utility by reproducing and rigorously evaluating DINO-WM under both in-distribution and out-of-distribution conditions, revealing limited zero-shot robustness to environmental variations. The approach enables reproducible experimentation, controlled robustness analysis, and standardized benchmarking, which could accelerate progress in controllable world models. The authors also outline future directions, including enhanced debugging/interpretation tools, additional environments, and a community benchmark via platforms like Hugging Face.

Abstract

World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct experience. Despite recent interest in World Models, most available implementations remain publication-specific, severely limiting their reusability, increasing the risk of bugs, and reducing evaluation standardization. To mitigate these issues, we introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem that provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations. In addition, each environment in SWM enables controllable factors of variation, including visual and physical properties, to support robustness and continual learning research. Finally, we demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.
Paper Structure (22 sections, 2 figures, 2 tables)

This paper contains 22 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: SWM Environment Suite. We support (and extend) a diverse set of established environments, including 2D/3D settings with tasks in manipulation, navigation, and classic control. (a) Push-T chi2025diffusion. A manipulation task where a blue agent needs to push a T-shaped block to match the green anchor. (b) Two-Room sobal2025stresstesting. A 2d navigation task where a red agent needs to navigate through a door to reach a green goal in the room. (c) DeepMind Control Suite tassa2018deepmind, a collection of 3d control tasks in MuJoCo. (d) OGBench park2025ogbench, a 3D robotic manipulation task collection in MuJoCo. (Top) Default settings. (Bottom) All factors of variations changing visual, geometric, and physical properties. All supported environments and their associated FoV can be found in Figure \ref{['fig:all-env-appendix']} and Table \ref{['table:all-fov']}.
  • Figure 2: Visualization of SWM Environments suite.