MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
Xuehai He, Shijie Zhou, Thivyanth Venkateswaran, Kaizhi Zheng, Ziyu Wan, Achuta Kadambi, Xin Eric Wang
TL;DR
MorphoSim addresses the need for programmable, multi-view 4D world models in robotics by integrating a language-driven interface with trajectory-guided 4D generation and editable 4D representations. The approach couples an LLM-based command parameterizer, a Scene Generator that uses trajectory-conditioned diffusion and dynamic 3D Gaussians, and a Scene Editor for object-level edits such as color changes, extraction, and removal, all while preserving temporal and multi-view coherence. Key contributions include the three-module architecture, trajectory-guided cross-attention mechanisms, a dynamic control submodule, and a static edit pathway with feature-field distillation, enabling both data generation and robust evaluation of visuomotor policies. The framework demonstrates high-fidelity 4D scenes and flexible edits on robotics-relevant scenarios, facilitating synthetic data creation, controlled perturbations for evaluation, and rapid task-variant construction.
Abstract
World models that support controllable and editable spatiotemporal environments are valuable for robotics, enabling scalable training data, repro ducible evaluation, and flexible task design. While recent text-to-video models generate realistic dynam ics, they are constrained to 2D views and offer limited interaction. We introduce MorphoSim, a language guided framework that generates 4D scenes with multi-view consistency and object-level controls. From natural language instructions, MorphoSim produces dynamic environments where objects can be directed, recolored, or removed, and scenes can be observed from arbitrary viewpoints. The framework integrates trajectory-guided generation with feature field dis tillation, allowing edits to be applied interactively without full re-generation. Experiments show that Mor phoSim maintains high scene fidelity while enabling controllability and editability. The code is available at https://github.com/eric-ai-lab/Morph4D.
