AerialGo: Walking-through City View Generation from Aerial Perspectives
Fuqiang Zhao, Yijing Guo, Siyuan Yang, Xi Chen, Luo Wang, Lan Xu, Yingliang Zhang, Yujiao Shi, Jingyi Yu
TL;DR
AerialGo addresses the privacy and scalability bottlenecks of city-scale 3D reconstruction by generating realistic ground-view images from aerial data using a multi-view diffusion framework conditioned on aerial references and 3D priors. It introduces a diffusion-based Aerial2Ground generator and integrates generated priors into 3DGS backbones, yielding improved ground-view fidelity and structural coherence. The paper also presents the AerialGo dataset, a large-scale collection of 3.45 million aerial and ground-view images across 134 km^2 with depth and camera annotations to enable training and evaluation. Across extensive experiments, AerialGo demonstrates superior ground-level realism and occlusion handling, offering a privacy-preserving, scalable approach for city-scale 3D reconstruction and walk-through rendering.
Abstract
High-quality 3D urban reconstruction is essential for applications in urban planning, navigation, and AR/VR. However, capturing detailed ground-level data across cities is both labor-intensive and raises significant privacy concerns related to sensitive information, such as vehicle plates, faces, and other personal identifiers. To address these challenges, we propose AerialGo, a novel framework that generates realistic walking-through city views from aerial images, leveraging multi-view diffusion models to achieve scalable, photorealistic urban reconstructions without direct ground-level data collection. By conditioning ground-view synthesis on accessible aerial data, AerialGo bypasses the privacy risks inherent in ground-level imagery. To support the model training, we introduce AerialGo dataset, a large-scale dataset containing diverse aerial and ground-view images, paired with camera and depth information, designed to support generative urban reconstruction. Experiments show that AerialGo significantly enhances ground-level realism and structural coherence, providing a privacy-conscious, scalable solution for city-scale 3D modeling.
