Table of Contents
Fetching ...

Code2Worlds: Empowering Coding LLMs for 4D World Generation

Yi Zhang, Yunshuang Wang, Zeyu Zhang, Hao Tang

TL;DR

This work addresses the challenge of generating physically grounded 4D world dynamics from natural language. It introduces Code2Worlds, a language-to-simulation framework with a dual-stream architecture that separates object-level detail from environmental orchestration, and a physics-aware closed-loop that uses PostProcess and VLM-based critics to iteratively refine dynamics. The authors propose Code4D as a dedicated benchmark and demonstrate that Code2Worlds outperforms static or single-pass baselines across object fidelity, scene richness, and 4D dynamics, achieving high SGS, Richness, and low physics failure rates. The approach advances safe and controllable sim-to-real embodied AI, while acknowledging computational overhead and biases and pointing to future directions such as neural physics distillation to accelerate generation.

Abstract

Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.

Code2Worlds: Empowering Coding LLMs for 4D World Generation

TL;DR

This work addresses the challenge of generating physically grounded 4D world dynamics from natural language. It introduces Code2Worlds, a language-to-simulation framework with a dual-stream architecture that separates object-level detail from environmental orchestration, and a physics-aware closed-loop that uses PostProcess and VLM-based critics to iteratively refine dynamics. The authors propose Code4D as a dedicated benchmark and demonstrate that Code2Worlds outperforms static or single-pass baselines across object fidelity, scene richness, and 4D dynamics, achieving high SGS, Richness, and low physics failure rates. The approach advances safe and controllable sim-to-real embodied AI, while acknowledging computational overhead and biases and pointing to future directions such as neural physics distillation to accelerate generation.

Abstract

Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.
Paper Structure (40 sections, 11 equations, 24 figures, 6 tables, 1 algorithm)

This paper contains 40 sections, 11 equations, 24 figures, 6 tables, 1 algorithm.

Figures (24)

  • Figure 1: Code2Worlds Execution Pipeline. The framework generates 4D scenes via a dual-stream architecture: 1) an Object Stream utilizing retrieval augmented parameter generation with object self-reflection; 2) a Scene Stream employing hierarchical environmental orchestration; and 3) refinement mechanism driven by a PostProcess Agent and self-reflection.
  • Figure 2: A detailed workflow for generating a 4D scene, integrating environmental scene, object generation, and feedback-driven refinement to ensure realistic scene rendering.
  • Figure 3: A series of environmental effects rendered in different scenes: 1) Relighting adjustments, 2) Water spill interaction, 3) Leaf fall simulation, 4) Jellyfish movement in an aquatic environment, and 5) Fire effect in a natural setting.
  • Figure 4: Example of leaf parameter
  • Figure 5: Example of jellyfish parameter
  • ...and 19 more figures