Table of Contents
Fetching ...

WonderZoom: Multi-Scale 3D World Generation

Jin Cao, Hong-Xing Yu, Jiajun Wu

TL;DR

WonderZoom tackles the lack of scale-aware 3D generation by introducing scale-adaptive Gaussian surfels that grow incrementally and a progressive detail synthesizer that adds finer content conditioned on coarser geometry and user prompts. The method supports interactive zooming into any region, generating new, semantically coherent details across multiple scales while preserving cross-scale consistency and real-time rendering. Through extensive comparisons and ablations, WonderZoom outperforms state-of-the-art video and 3D generation baselines in both perceptual quality and prompt alignment, enabling truly multi-scale 3D world creation from a single image. The work enables immersive, editable virtual environments spanning from macro landscapes to micro details, with practical implications for content creation and exploration.

Abstract

We present WonderZoom, a novel approach to generating 3D scenes with contents across multiple spatial scales from a single image. Existing 3D world generation models remain limited to single-scale synthesis and cannot produce coherent scene contents at varying granularities. The fundamental challenge is the lack of a scale-aware 3D representation capable of generating and rendering content with largely different spatial sizes. WonderZoom addresses this through two key innovations: (1) scale-adaptive Gaussian surfels for generating and real-time rendering of multi-scale 3D scenes, and (2) a progressive detail synthesizer that iteratively generates finer-scale 3D contents. Our approach enables users to "zoom into" a 3D region and auto-regressively synthesize previously non-existent fine details from landscapes to microscopic features. Experiments demonstrate that WonderZoom significantly outperforms state-of-the-art video and 3D models in both quality and alignment, enabling multi-scale 3D world creation from a single image. We show video results and an interactive viewer of generated multi-scale 3D worlds in https://wonderzoom.github.io/

WonderZoom: Multi-Scale 3D World Generation

TL;DR

WonderZoom tackles the lack of scale-aware 3D generation by introducing scale-adaptive Gaussian surfels that grow incrementally and a progressive detail synthesizer that adds finer content conditioned on coarser geometry and user prompts. The method supports interactive zooming into any region, generating new, semantically coherent details across multiple scales while preserving cross-scale consistency and real-time rendering. Through extensive comparisons and ablations, WonderZoom outperforms state-of-the-art video and 3D generation baselines in both perceptual quality and prompt alignment, enabling truly multi-scale 3D world creation from a single image. The work enables immersive, editable virtual environments spanning from macro landscapes to micro details, with practical implications for content creation and exploration.

Abstract

We present WonderZoom, a novel approach to generating 3D scenes with contents across multiple spatial scales from a single image. Existing 3D world generation models remain limited to single-scale synthesis and cannot produce coherent scene contents at varying granularities. The fundamental challenge is the lack of a scale-aware 3D representation capable of generating and rendering content with largely different spatial sizes. WonderZoom addresses this through two key innovations: (1) scale-adaptive Gaussian surfels for generating and real-time rendering of multi-scale 3D scenes, and (2) a progressive detail synthesizer that iteratively generates finer-scale 3D contents. Our approach enables users to "zoom into" a 3D region and auto-regressively synthesize previously non-existent fine details from landscapes to microscopic features. Experiments demonstrate that WonderZoom significantly outperforms state-of-the-art video and 3D models in both quality and alignment, enabling multi-scale 3D world creation from a single image. We show video results and an interactive viewer of generated multi-scale 3D worlds in https://wonderzoom.github.io/

Paper Structure

This paper contains 40 sections, 5 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Multi-scale 3D world generation from a single image. WonderZoom enables interactive exploration across spatial scales. Users can zoom into any region and specify prompts to generate new fine-scale content while maintaining cross-scale consistency. Here we show three zoom-in sequences. We attach an interactive viewer in the supplmentary material.
  • Figure 2: WonderZoom overview. From an input image, we first reconstruct an initialized 3D scene. Users specify prompts and camera viewpoints to generate finer-scale content. Our progressive detail synthesizer creates new-scale images, registers depth to maintain geometric consistency, and synthesizes auxiliary views for complete 3D scene creation. Our scale-adaptive Gaussian surfels enable dynamic updates without re-optimization, seamlessly integrating new content while preserving real-time rendering.
  • Figure 3: Comparison of WonderZoom with baselines on multi-scale 3D world generation.
  • Figure 4: Qualitative results of WonderZoom on multi-scale 3D world generation.
  • Figure 5: Ablation on the opacity modulation.
  • ...and 8 more figures