EarthGen: Generating the World from Top-Down Views
Ansh Sharma, Albert Xiao, Praneet Rathi, Rohit Kundu, Albert Zhai, Yuan Shen, Shenlong Wang
TL;DR
EarthGen presents a scalable framework for infinite‑size, high‑resolution Earth imagery by fusing a base latent diffusion model with cascaded, scale‑aware super‑resolution and a tiling based mixture of diffusers. The approach achieves coherent, ultra‑high resolution terrain across large geographies while enabling interactive gigapixel exploration and downstream 3D scene generation. Key innovations include negative text conditioning to curb low‑quality outputs, a mixture of diffusers to enforce tiling continuity, and a training pipeline that jointly tunes a VAE, base LDM, and SR cascades on Bing Maps data. The results show substantial improvements over state‑of‑the‑art SR baselines in quality and realism, with practical applications in controllable world design, environmental analytics, and asset creation, and the work is open sourced for broad adoption.
Abstract
In this work, we present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. Pairing this concept with a tiled generation method yields a scalable system that can generate thousands of square kilometers of realistic Earth surfaces at high resolution. We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom. We also demonstrate its ability to create diverse and coherent scenes via an interactive gigapixel-scale generated map. Finally, we demonstrate how our system can be extended to enable novel content creation applications including controllable world generation and 3D scene generation.
