Table of Contents
Fetching ...

What if? Emulative Simulation with World Models for Situated Reasoning

Ruiping Liu, Yufan Chen, Yuheng Zhang, Junwei Zheng, Kunyu Peng, Chengzhi Wu, Chenguang Huang, Di Wen, Jiaming Zhang, Kailun Yang, Rainer Stiefelhagen

TL;DR

WanderDream is introduced, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration, and extensive experiments demonstrate that mental exploration is essential for situated reasoning and that WanderDream data exhibit remarkable transferability to real-world scenarios.

Abstract

Situated reasoning often relies on active exploration, yet in many real-world scenarios such exploration is infeasible due to physical constraints of robots or safety concerns of visually impaired users. Given only a limited observation, can an agent mentally simulate a future trajectory toward a target situation and answer spatial what-if questions? We introduce WanderDream, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration. WanderDream-Gen comprises 15.8K panoramic videos across 1,088 real scenes from HM3D, ScanNet++, and real-world captures, depicting imagined trajectories from current viewpoints to target situations. WanderDream-QA contains 158K question-answer pairs, covering starting states, paths, and end states along each trajectory to comprehensively evaluate exploration-based reasoning. Extensive experiments with world models and MLLMs demonstrate (1) that mental exploration is essential for situated reasoning, (2) that world models achieve compelling performance on WanderDream-Gen, (3) that imagination substantially facilitates reasoning on WanderDream-QA, and (4) that WanderDream data exhibit remarkable transferability to real-world scenarios. The source code and all data will be released.

What if? Emulative Simulation with World Models for Situated Reasoning

TL;DR

WanderDream is introduced, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration, and extensive experiments demonstrate that mental exploration is essential for situated reasoning and that WanderDream data exhibit remarkable transferability to real-world scenarios.

Abstract

Situated reasoning often relies on active exploration, yet in many real-world scenarios such exploration is infeasible due to physical constraints of robots or safety concerns of visually impaired users. Given only a limited observation, can an agent mentally simulate a future trajectory toward a target situation and answer spatial what-if questions? We introduce WanderDream, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration. WanderDream-Gen comprises 15.8K panoramic videos across 1,088 real scenes from HM3D, ScanNet++, and real-world captures, depicting imagined trajectories from current viewpoints to target situations. WanderDream-QA contains 158K question-answer pairs, covering starting states, paths, and end states along each trajectory to comprehensively evaluate exploration-based reasoning. Extensive experiments with world models and MLLMs demonstrate (1) that mental exploration is essential for situated reasoning, (2) that world models achieve compelling performance on WanderDream-Gen, (3) that imagination substantially facilitates reasoning on WanderDream-QA, and (4) that WanderDream data exhibit remarkable transferability to real-world scenarios. The source code and all data will be released.
Paper Structure (22 sections, 1 equation, 24 figures, 9 tables, 2 algorithms)

This paper contains 22 sections, 1 equation, 24 figures, 9 tables, 2 algorithms.

Figures (24)

  • Figure 1: Emulative simulation with WanderDream. Putting oneself in the mental shoes of the agent to imagine the visual trajectory from the current perception $s_0$ toward the target situation $s_T$, and reasoning along the imagined path to answer "what-if" questions. Throughout the paper, green denotes the current state, while blue represents imagination.
  • Figure 2: Constraints of active exploration: robot embodiment limits (e.g., inability to climb stairs) and visually impaired users’ psychological safety barriers when encountering obstacles without tactile cues.
  • Figure 3: Two layers of mental imagination. Task-oriented instrumental simulation (left), such as Navigation World Models bar2025navigation_world_models, and experience-oriented emulative simulation (right), empowered by the proposed WanderDream.
  • Figure 4: WanderDream-Gen. Top row: Object navigation as robotic situated path imagination in HM3D. Middle row: Human-situated perspective with direct interpolation as the shortest path when no non-traversable obstacles are present. Bottom row: Human-situated perspective with computed shortest path accounting for non-traversable obstacles (e.g., walls).
  • Figure 5: WanderDream-QA generation pipeline with an example in HM3D.
  • ...and 19 more figures