Table of Contents
Fetching ...

LivingWorld: Interactive 4D World Generation with Environmental Dynamics

Hyeongju Mun, In-Hwan Jin, Sohyeong Kim, Kyeongbo Kong

Abstract

We introduce LivingWorld, an interactive framework for generating 4D worlds with environmental dynamics from a single image. While recent advances in 3D scene generation enable large-scale environment creation, most approaches focus primarily on reconstructing static geometry, leaving scene-scale environmental dynamics such as clouds, water, or smoke largely unexplored. Modeling such dynamics is challenging because motion must remain coherent across an expanding scene while supporting low-latency user feedback. LivingWorld addresses this challenge by progressively constructing a globally coherent motion field as the scene expands. To maintain global consistency during expansion, we introduce a geometry-aware alignment module that resolves directional and scale ambiguities across views. We further represent motion using a compact hash-based motion field, enabling efficient querying and stable propagation of dynamics throughout the scene. This representation also supports bidirectional motion propagation during rendering, producing long and temporally coherent 4D sequences without relying on expensive video-based refinement. On a single RTX 5090 GPU, generating each new scene expansion step requires 9 seconds, followed by 3 seconds for motion alignment and motion field updates, enabling interactive 4D world generation with globally coherent environmental dynamics. Video demonstrations are available at cvsp-lab.github.io/LivingWorld.

LivingWorld: Interactive 4D World Generation with Environmental Dynamics

Abstract

We introduce LivingWorld, an interactive framework for generating 4D worlds with environmental dynamics from a single image. While recent advances in 3D scene generation enable large-scale environment creation, most approaches focus primarily on reconstructing static geometry, leaving scene-scale environmental dynamics such as clouds, water, or smoke largely unexplored. Modeling such dynamics is challenging because motion must remain coherent across an expanding scene while supporting low-latency user feedback. LivingWorld addresses this challenge by progressively constructing a globally coherent motion field as the scene expands. To maintain global consistency during expansion, we introduce a geometry-aware alignment module that resolves directional and scale ambiguities across views. We further represent motion using a compact hash-based motion field, enabling efficient querying and stable propagation of dynamics throughout the scene. This representation also supports bidirectional motion propagation during rendering, producing long and temporally coherent 4D sequences without relying on expensive video-based refinement. On a single RTX 5090 GPU, generating each new scene expansion step requires 9 seconds, followed by 3 seconds for motion alignment and motion field updates, enabling interactive 4D world generation with globally coherent environmental dynamics. Video demonstrations are available at cvsp-lab.github.io/LivingWorld.

Paper Structure

This paper contains 33 sections, 23 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: LivingWorld generates a dynamic 4D world with environmental dynamics from a single image. Our geometry-aware alignment module maintains globally coherent scene dynamics as the world progressively expands.
  • Figure 2: Overall framework of the proposed method. Starting from the previous 4D scene, the environment is progressively expanded through camera movement and outpainting. Motion cues estimated from user-guided masks are aligned using a geometry-aware alignment module and integrated into a hash-based motion field, enabling bidirectional motion propagation for temporally coherent 4D scenes.
  • Figure 3: Geometry-Aware Alignment Module for resolving local ambiguity and propagating temporally consistent motion.
  • Figure 4: Qualitative comparison under camera movement. (a) Veo 3.1 wiedemer2025video, (b) CogVideoX yang2024cogvideox, (c) Tora zhang2025tora, and (d) Ours. Each column corresponds to the camera viewpoint indicated by the colored camera icons.
  • Figure 5: Qualitative comparison of motion consistency across different methods on the same input image. (a) Naive Scene Flow, (b) WonderWorld yu2025wonderworld + 3D-MOM, and (c) Ours.
  • ...and 5 more figures