Table of Contents
Fetching ...

3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation

Yunhong He, Zhengqing Yuan, Zhengzhong Tu, Yanfang Ye, Lichao Sun

TL;DR

3D4D addresses the challenge of enabling genuinely interactive 4D visualization within WebGL-based pipelines. It integrates Supersplat rendering with a four-module backend to transform static images and text into temporally coherent 4D scenes, and introduces a foveated rendering strategy guided by a Vision-Language Model to balance fidelity and efficiency. The approach supports real-time exploration and editing in the browser through a fully client-side video rendering pipeline, achieving strong semantic alignment and high frame rates. Empirical results show CC=30.40, CS=0.9951, and 60 FPS, with real-time interactivity, outperforming existing approaches.

Abstract

We introduce 3D4D, an interactive 4D visualization framework that integrates WebGL with Supersplat rendering. It transforms static images and text into coherent 4D scenes through four core modules and employs a foveated rendering strategy for efficient, real-time multi-modal interaction. This framework enables adaptive, user-driven exploration of complex 4D environments. The project page and code are available at https://yunhonghe1021.github.io/NOVA/.

3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation

TL;DR

3D4D addresses the challenge of enabling genuinely interactive 4D visualization within WebGL-based pipelines. It integrates Supersplat rendering with a four-module backend to transform static images and text into temporally coherent 4D scenes, and introduces a foveated rendering strategy guided by a Vision-Language Model to balance fidelity and efficiency. The approach supports real-time exploration and editing in the browser through a fully client-side video rendering pipeline, achieving strong semantic alignment and high frame rates. Empirical results show CC=30.40, CS=0.9951, and 60 FPS, with real-time interactivity, outperforming existing approaches.

Abstract

We introduce 3D4D, an interactive 4D visualization framework that integrates WebGL with Supersplat rendering. It transforms static images and text into coherent 4D scenes through four core modules and employs a foveated rendering strategy for efficient, real-time multi-modal interaction. This framework enables adaptive, user-driven exploration of complex 4D environments. The project page and code are available at https://yunhonghe1021.github.io/NOVA/.

Paper Structure

This paper contains 5 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of 3D4D pipeline. The system integrates multi-modal inputs with real-time 4D rendering to support interactive exploration.
  • Figure 2: Illustrative input–prompt pair and evaluation axes. upper left: the single panoramic photograph fed to DreamGen. lower left: its accompanying natural-language prompt requesting. This pair serves as a running example for visualization results, where the generated 4D scene is assessed on the three WorldScore axes—Controllability, Quality, and Dynamics.
  • Figure 3: 3D4D’s adaptive frontend rendering: semantically important regions are rendered in high resolution, while peripheral areas are approximated to reduce cost.