What if? Emulative Simulation with World Models for Situated Reasoning

Ruiping Liu; Yufan Chen; Yuheng Zhang; Junwei Zheng; Kunyu Peng; Chengzhi Wu; Chenguang Huang; Di Wen; Jiaming Zhang; Kailun Yang; Rainer Stiefelhagen

What if? Emulative Simulation with World Models for Situated Reasoning

Ruiping Liu, Yufan Chen, Yuheng Zhang, Junwei Zheng, Kunyu Peng, Chengzhi Wu, Chenguang Huang, Di Wen, Jiaming Zhang, Kailun Yang, Rainer Stiefelhagen

TL;DR

WanderDream is introduced, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration, and extensive experiments demonstrate that mental exploration is essential for situated reasoning and that WanderDream data exhibit remarkable transferability to real-world scenarios.

Abstract

Situated reasoning often relies on active exploration, yet in many real-world scenarios such exploration is infeasible due to physical constraints of robots or safety concerns of visually impaired users. Given only a limited observation, can an agent mentally simulate a future trajectory toward a target situation and answer spatial what-if questions? We introduce WanderDream, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration. WanderDream-Gen comprises 15.8K panoramic videos across 1,088 real scenes from HM3D, ScanNet++, and real-world captures, depicting imagined trajectories from current viewpoints to target situations. WanderDream-QA contains 158K question-answer pairs, covering starting states, paths, and end states along each trajectory to comprehensively evaluate exploration-based reasoning. Extensive experiments with world models and MLLMs demonstrate (1) that mental exploration is essential for situated reasoning, (2) that world models achieve compelling performance on WanderDream-Gen, (3) that imagination substantially facilitates reasoning on WanderDream-QA, and (4) that WanderDream data exhibit remarkable transferability to real-world scenarios. The source code and all data will be released.

What if? Emulative Simulation with World Models for Situated Reasoning

TL;DR

Abstract

Paper Structure (22 sections, 1 equation, 24 figures, 9 tables, 2 algorithms)

This paper contains 22 sections, 1 equation, 24 figures, 9 tables, 2 algorithms.

Introduction
Related Work
WanderDream
WanderDream-Gen
WanderDream-QA
Data Quality Control and Statistics
Real-World Test Set
Frameworks and Metrics for Emulative Simulation
Experiments
Implementation Details
Results
Limitations and Future Work
Conclusion
Details of Data Generation
Video Generation for WanderDream-Gen
...and 7 more sections

Figures (24)

Figure 1: Emulative simulation with WanderDream. Putting oneself in the mental shoes of the agent to imagine the visual trajectory from the current perception $s_0$ toward the target situation $s_T$, and reasoning along the imagined path to answer "what-if" questions. Throughout the paper, green denotes the current state, while blue represents imagination.
Figure 2: Constraints of active exploration: robot embodiment limits (e.g., inability to climb stairs) and visually impaired users’ psychological safety barriers when encountering obstacles without tactile cues.
Figure 3: Two layers of mental imagination. Task-oriented instrumental simulation (left), such as Navigation World Models bar2025navigation_world_models, and experience-oriented emulative simulation (right), empowered by the proposed WanderDream.
Figure 4: WanderDream-Gen. Top row: Object navigation as robotic situated path imagination in HM3D. Middle row: Human-situated perspective with direct interpolation as the shortest path when no non-traversable obstacles are present. Bottom row: Human-situated perspective with computed shortest path accounting for non-traversable obstacles (e.g., walls).
Figure 5: WanderDream-QA generation pipeline with an example in HM3D.
...and 19 more figures

What if? Emulative Simulation with World Models for Situated Reasoning

TL;DR

Abstract

What if? Emulative Simulation with World Models for Situated Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (24)