WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Weijie Wang, Haoyun Li, Guosheng Zhao, Jie Li, Wenkang Qin, Guan Huang, Wenjun Mei
TL;DR
WonderTurbo tackles the bottleneck of real-time interactive 3D generation from a single image by jointly accelerating geometry and appearance modeling. It introduces StepSplat for fast, depth-guided geometry via a memory-augmented cost volume and incremental fusion, QuickDepth for lightweight depth completion, and FastPaint for a 2-step diffusion-based appearance refinement. Together, these enable interactive scene updates in 0.72 seconds with up to a 15× speedup over prior methods while maintaining strong spatial consistency and rendering quality. The approach is validated against both offline and online baselines using CLIP-based metrics and user studies, demonstrating substantial gains in speed with competitive or superior quality and coherence, making real-time interactive 3D editing from a single image practically feasible.
Abstract
Interactive 3D generation is gaining momentum and capturing extensive attention for its potential to create immersive virtual experiences. However, a critical challenge in current 3D generation technologies lies in achieving real-time interactivity. To address this issue, we introduce WonderTurbo, the first real-time interactive 3D scene generation framework capable of generating novel perspectives of 3D scenes within 0.72 seconds. Specifically, WonderTurbo accelerates both geometric and appearance modeling in 3D scene generation. In terms of geometry, we propose StepSplat, an innovative method that constructs efficient 3D geometric representations through dynamic updates, each taking only 0.26 seconds. Additionally, we design QuickDepth, a lightweight depth completion module that provides consistent depth input for StepSplat, further enhancing geometric accuracy. For appearance modeling, we develop FastPaint, a 2-steps diffusion model tailored for instant inpainting, which focuses on maintaining spatial appearance consistency. Experimental results demonstrate that WonderTurbo achieves a remarkable 15X speedup compared to baseline methods, while preserving excellent spatial consistency and delivering high-quality output.
