Table of Contents
Fetching ...

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, Zhaoxiang Zhang

TL;DR

CityGaussianV2 tackles the challenge of geometrically accurate and scalable reconstruction for large-scale scenes by extending 2D Gaussian Splatting with depth supervision, a Decomposed-Gradient Densification strategy, and an Elongation Filter to control Gaussian proliferation. The method features a highly parallel training pipeline, SH degree-2 representations from the start, and per-block trimming with vectree quantization to dramatically reduce training time and memory while delivering superior geometric fidelity. It also establishes a Tanks-and-Temple–style geometry benchmark with visibility-based crop volume estimation to enable robust comparisons in unbounded scenes. Experimental results on GauU-Scene and MatrixCity show strong geometry improvements with competitive rendering and substantial efficiency gains, including a smaller variant that significantly lowers computational costs.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10$\times$ compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs. The project page is available at https://dekuliutesla.github.io/CityGaussianV2/.

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

TL;DR

CityGaussianV2 tackles the challenge of geometrically accurate and scalable reconstruction for large-scale scenes by extending 2D Gaussian Splatting with depth supervision, a Decomposed-Gradient Densification strategy, and an Elongation Filter to control Gaussian proliferation. The method features a highly parallel training pipeline, SH degree-2 representations from the start, and per-block trimming with vectree quantization to dramatically reduce training time and memory while delivering superior geometric fidelity. It also establishes a Tanks-and-Temple–style geometry benchmark with visibility-based crop volume estimation to enable robust comparisons in unbounded scenes. Experimental results on GauU-Scene and MatrixCity show strong geometry improvements with competitive rendering and substantial efficiency gains, including a smaller variant that significantly lowers computational costs.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10 compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs. The project page is available at https://dekuliutesla.github.io/CityGaussianV2/.

Paper Structure

This paper contains 19 sections, 4 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Illustration of the superiority of CityGaussianV2. (a) Our method reconstructs large-scale complex scenes with accurate geometry from multi-view RGB images, restoring intricate structures of woods, buildings, and roads. (b) "Ours-coarse" denotes training 2DGS with our optimization algorithm. This strategy accelerates 2DGS reconstruction in terms of both rendering quality (PSNR, SSIM) and geometry accuracy (F1 score). (c) Our optimized parallel training pipeline reduces the training time and memory by 25% and 50% respectively, while achieving better geometric quality. We report mean quality metrics in GauU-Scenexiong2024gauuscene here, with the best performance in each column highlighted in bold.
  • Figure 2: Illustration of our optimization mechanism. We densify Gaussians exclusively according to the gradient of SSIM loss. This helps remove large and blurry Gaussians and accelerate convergence. Meanwhile, we disable the densification of Gaussians with extreme elongation to avoid the Gaussian count explosion shown in \ref{['fig: explosion']}. We also supervise the rendered depth with that predicted by Depth Anything V2 yang2024depth. This helps improve both rendering and geometry quality.
  • Figure 3: Illustration of the motivation and effectiveness of our Elongation Filter. We take the tuning of one block of Rubbleturki2022mega scene as an example. On the left, we highlight the collection of Gaussian primitives with high gradient or extreme elongation. There is a significant overlap between two collections. By restricting densification of these sand-like points, we prevent out-of-memory (OOM) errors caused by an explosion in Gaussian count, enabling a steady count evolution analogous to CityGaussian liu2024citygaussian in parallel tuning, as depicted on the right.
  • Figure 4: Illustration of pipeline modification. The pipeline of CityGS liu2024citygaussian (dashed boxes and arrows) is compared with ours. We successfully removed time-consuming post-pruning and distillation, while enabling storage compression for 2DGS.
  • Figure 5: Illustration of the evaluation process.
  • ...and 6 more figures