CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Yang Liu; He Guan; Chuanchen Luo; Lue Fan; Naiyan Wang; Junran Peng; Zhaoxiang Zhang

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Yang Liu, He Guan, Chuanchen Luo, Lue Fan, Naiyan Wang, Junran Peng, Zhaoxiang Zhang

TL;DR

CityGS tackles the challenge of real-time, high-fidelity rendering for large-scale scenes by introducing a divide-and-conquer training framework that builds a global Gaussian prior and uses block-wise data/primitive division. A block-wise Level-of-Detail strategy compresses and selectively renders Gaussians based on frustum visibility and distance, enabling scalable, real-time performance. The method achieves state-of-the-art fidelity on MatrixCity while maintaining real-time speeds across varying scales, and demonstrates robust qualitative and quantitative results across multiple real-world datasets. The approach also supports seamless fusion across blocks and boundary continuity, with extensive ablations validating the key components and a suite of supplementary results.

Abstract

The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS). However, effectively training large-scale 3DGS and rendering it in real-time across various scales remains challenging. This paper introduces CityGaussian (CityGS), which employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strategy for efficient large-scale 3DGS training and rendering. Specifically, the global scene prior and adaptive training data selection enables efficient training and seamless fusion. Based on fused Gaussian primitives, we generate different detail levels through compression, and realize fast rendering across various scales through the proposed block-wise detail levels selection and aggregation strategy. Extensive experimental results on large-scale scenes demonstrate that our approach attains state-of-theart rendering quality, enabling consistent real-time rendering of largescale scenes across vastly different scales. Our project page is available at https://dekuliutesla.github.io/citygs/.

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

TL;DR

Abstract

Paper Structure (31 sections, 6 equations, 9 figures, 8 tables)

This paper contains 31 sections, 6 equations, 9 figures, 8 tables.

Introduction
Related Works
Neural Rendering
Neural Radiance Field
Point-based Rendering
Large Scale Scene Reconstruction
Level of Detail
Method
Preliminary
Training CityGS
Global Gaussian Prior Generation
Primitives and Data Division
Finetuning and Post-processing
Level-of-Detail on CityGS
Detail Level Generation
...and 16 more sections

Figures (9)

Figure 1: (a, b, c) Our proposed CityGS achieves the SOTA rendering fidelity on Small City scene (5620 training images, 740 test images) of MatrixCity dataset. The setting of baseline 3DGS$^\dagger$ is discussed in \ref{['subsec:4.1-Setup']}. (d, e) Here Z denotes camera height. Without LoD, CityGS would render over 20 million points, leading to considerable VRAM and time costs. The LoD saves VRAM and enables real-time performance under various scales. Note that the FPS is tested with CUDA synchronization for objective evaluation.
Figure 2: The training process of CityGS. The pink square bounds the foreground area, facilitating subsequent unbounded space contraction and Gaussian partitioning. Then for a specific block, a pose is assigned to training set if it is inside the block or if the block has a considerable contribution to the rendering result. These blocks are then parallelly trained and merged togethor to depict the whole scene.
Figure 3: Rendering of CityGS. Based on trained dense Gaussians, we generate detail levels with different compression rates $r_0$, $r_1$ and $r_2$. When rendering, all the Gaussians in the same block will share the same detail level, which is determined by the block's distance to the camera. Since the contraction-based block partition leads to some irregular block shapes, we estimate their bounding boxes after removing floaters. The frustum intersection with the estimated block shape determines whether the block will be fed to rasterizer.
Figure 4: Qualitative comparison with SOTA methods on real-scene datasets.
Figure 5: Qualitative comparison with SOTA methods on MatrixCity dataset.
...and 4 more figures

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

TL;DR

Abstract

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Authors

TL;DR

Abstract

Table of Contents

Figures (9)