WonderWorld: Interactive 3D Scene Generation from a Single Image
Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu
TL;DR
WonderWorld delivers interactive 3D scene generation from a single image by introducing FLAGS, a Fast Layered Gaussian Surfels representation, coupled with a geometry-aware initialization to enable sub-second per-layer optimization and sub-10-second per-scene generation on a single GPU. It further mitigates geometric seams across extrapolated scenes through a training-free guided depth diffusion that conditions depth estimates on visible geometry. The system supports real-time user control over camera paths and content prompts to create connected, diverse worlds, demonstrated against strong baselines with quantitative and human-evaluated metrics. Ablation studies confirm the necessity of layered surfels, geometry-based initialization, and depth guidance for quality and consistency, and the work releases full code to promote reproducibility and adoption in VR, gaming, and creative design.
Abstract
We present WonderWorld, a novel framework for interactive 3D scene generation that enables users to interactively specify scene contents and layout and see the created scenes in low latency. The major challenge lies in achieving fast generation of 3D scenes. Existing scene generation approaches fall short of speed as they often require (1) progressively generating many views and depth maps, and (2) time-consuming optimization of the scene geometry representations. We introduce the Fast Layered Gaussian Surfels (FLAGS) as our scene representation and an algorithm to generate it from a single view. Our approach does not need multiple views, and it leverages a geometry-based initialization that significantly reduces optimization time. Another challenge is generating coherent geometry that allows all scenes to be connected. We introduce the guided depth diffusion that allows partial conditioning of depth estimation. WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. We demonstrate the potential of WonderWorld for user-driven content creation and exploration in virtual environments. We release full code and software for reproducibility. Project website: https://kovenyu.com/WonderWorld/.
