Generative Gaussian Splatting for Unbounded 3D City Generation
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu
TL;DR
GaussianCity introduces a compact BEV-Point representation and a BEV-Point Decoder to enable unbounded 3D city generation with 3D Gaussian Splatting. By filtering to visible BEV points and decoupling position-related and style-related attributes, it keeps VRAM usage constant while achieving high realism and efficiency, outperforming prior NeRF-based and Gauss-splat methods with significant speedups. The approach is validated on GoogleEarth and KITTI-360, showing state-of-the-art results in drone-view and street-view generation, along with thorough ablations and analysis of limitations. This work enables scalable, real-time generation of large-scale city scenes with practical implications for gaming, simulation, and VR/AR applications.
Abstract
3D city generation with NeRF-based methods shows promising generation results but is computationally inefficient. Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation. However, adapting 3D-GS from finite-scale 3D objects and humans to infinite-scale 3D cities is non-trivial. Unbounded 3D city generation entails significant storage overhead (out-of-memory issues), arising from the need to expand points to billions, often demanding hundreds of Gigabytes of VRAM for a city scene spanning 10km^2. In this paper, we propose GaussianCity, a generative Gaussian Splatting framework dedicated to efficiently synthesizing unbounded 3D cities with a single feed-forward pass. Our key insights are two-fold: 1) Compact 3D Scene Representation: We introduce BEV-Point as a highly compact intermediate representation, ensuring that the growth in VRAM usage for unbounded scenes remains constant, thus enabling unbounded city generation. 2) Spatial-aware Gaussian Attribute Decoder: We present spatial-aware BEV-Point decoder to produce 3D Gaussian attributes, which leverages Point Serializer to integrate the structural and contextual characteristics of BEV points. Extensive experiments demonstrate that GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation. Notably, compared to CityDreamer, GaussianCity exhibits superior performance with a speedup of 60 times (10.72 FPS v.s. 0.18 FPS).
