3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation
Yunhong He, Zhengqing Yuan, Zhengzhong Tu, Yanfang Ye, Lichao Sun
TL;DR
3D4D addresses the challenge of enabling genuinely interactive 4D visualization within WebGL-based pipelines. It integrates Supersplat rendering with a four-module backend to transform static images and text into temporally coherent 4D scenes, and introduces a foveated rendering strategy guided by a Vision-Language Model to balance fidelity and efficiency. The approach supports real-time exploration and editing in the browser through a fully client-side video rendering pipeline, achieving strong semantic alignment and high frame rates. Empirical results show CC=30.40, CS=0.9951, and 60 FPS, with real-time interactivity, outperforming existing approaches.
Abstract
We introduce 3D4D, an interactive 4D visualization framework that integrates WebGL with Supersplat rendering. It transforms static images and text into coherent 4D scenes through four core modules and employs a foveated rendering strategy for efficient, real-time multi-modal interaction. This framework enables adaptive, user-driven exploration of complex 4D environments. The project page and code are available at https://yunhonghe1021.github.io/NOVA/.
