Table of Contents
Fetching ...

MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes

Kehua Chen, Tianlu Mao, Zhuxin Ma, Hao Jiang, Zehao Li, Zihan Liu, Shuqi Gao, Honglong Zhao, Feng Dai, Yucheng Zhang, Zhaoqi Wang

TL;DR

MetroGS tackles the challenge of geometrically faithful, large-scale scene reconstruction by combining a distributed 2D Gaussian Splatting backbone with a structured dense initialization, sparsity-aware densification, a progressive hybrid geometric refinement, and a depth-guided appearance model. The approach unifies geometry and appearance through a depth-consistent feature representation and per-image embeddings, enabling robust reconstruction under lighting and data variability. Key contributions include pointmap-assisted initialization, sparsity compensation densification, PatchMatch-based multi-view refinement, and depth-guided appearance modeling with Tri-Mip features, all trained in a scalable, GPU-friendly framework. Experiments on GauU-Scene and MatrixCity demonstrate superior geometric accuracy, rendering quality, and training efficiency compared with state-of-the-art baselines, highlighting practical impact for large-scale urban reconstruction.

Abstract

Recently, 3D Gaussian Splatting and its derivatives have achieved significant breakthroughs in large-scale scene reconstruction. However, how to efficiently and stably achieve high-quality geometric fidelity remains a core challenge. To address this issue, we introduce MetroGS, a novel Gaussian Splatting framework for efficient and robust reconstruction in complex urban environments. Our method is built upon a distributed 2D Gaussian Splatting representation as the core foundation, serving as a unified backbone for subsequent modules. To handle potential sparse regions in complex scenes, we propose a structured dense enhancement scheme that utilizes SfM priors and a pointmap model to achieve a denser initialization, while incorporating a sparsity compensation mechanism to improve reconstruction completeness. Furthermore, we design a progressive hybrid geometric optimization strategy that organically integrates monocular and multi-view optimization to achieve efficient and accurate geometric refinement. Finally, to address the appearance inconsistency commonly observed in large-scale scenes, we introduce a depth-guided appearance modeling approach that learns spatial features with 3D consistency, facilitating effective decoupling between geometry and appearance and further enhancing reconstruction stability. Experiments on large-scale urban datasets demonstrate that MetroGS achieves superior geometric accuracy, rendering quality, offering a unified solution for high-fidelity large-scale scene reconstruction.

MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes

TL;DR

MetroGS tackles the challenge of geometrically faithful, large-scale scene reconstruction by combining a distributed 2D Gaussian Splatting backbone with a structured dense initialization, sparsity-aware densification, a progressive hybrid geometric refinement, and a depth-guided appearance model. The approach unifies geometry and appearance through a depth-consistent feature representation and per-image embeddings, enabling robust reconstruction under lighting and data variability. Key contributions include pointmap-assisted initialization, sparsity compensation densification, PatchMatch-based multi-view refinement, and depth-guided appearance modeling with Tri-Mip features, all trained in a scalable, GPU-friendly framework. Experiments on GauU-Scene and MatrixCity demonstrate superior geometric accuracy, rendering quality, and training efficiency compared with state-of-the-art baselines, highlighting practical impact for large-scale urban reconstruction.

Abstract

Recently, 3D Gaussian Splatting and its derivatives have achieved significant breakthroughs in large-scale scene reconstruction. However, how to efficiently and stably achieve high-quality geometric fidelity remains a core challenge. To address this issue, we introduce MetroGS, a novel Gaussian Splatting framework for efficient and robust reconstruction in complex urban environments. Our method is built upon a distributed 2D Gaussian Splatting representation as the core foundation, serving as a unified backbone for subsequent modules. To handle potential sparse regions in complex scenes, we propose a structured dense enhancement scheme that utilizes SfM priors and a pointmap model to achieve a denser initialization, while incorporating a sparsity compensation mechanism to improve reconstruction completeness. Furthermore, we design a progressive hybrid geometric optimization strategy that organically integrates monocular and multi-view optimization to achieve efficient and accurate geometric refinement. Finally, to address the appearance inconsistency commonly observed in large-scale scenes, we introduce a depth-guided appearance modeling approach that learns spatial features with 3D consistency, facilitating effective decoupling between geometry and appearance and further enhancing reconstruction stability. Experiments on large-scale urban datasets demonstrate that MetroGS achieves superior geometric accuracy, rendering quality, offering a unified solution for high-fidelity large-scale scene reconstruction.

Paper Structure

This paper contains 34 sections, 13 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Illustration of the superiority of our method. (a) Our method accurately reconstructs the geometric structure of large-scale urban scenes, faithfully restoring fine details such as buildings, vegetation, and roads. (b) Compared with the SOTA method CityGSV2 liu2024citygaussianv2, our result are more complete and geometrically precise. (c) Benefiting from a well-designed training framework, our method achieves superior convergence speed and geometric quality. On four RTX 3090 GPUs, our method reaches better performance with less than 25% of the training time required by CityGSV2. The quantitative results are reported based on the quality metrics of Modern Building xiong2024gauu.
  • Figure 2: Overview. Starting with the input image sequences, we first utilize the prior information provided by SfM, combined with a pointmap model, to generate a high-quality initial point cloud. Next, an additional sparsity compensation optimization is introduced during the densification process to further refine sparse regions. We then combine monocular depth priors with multi-view consistency optimization to achieve progressive hybrid geometric refinement. Simultaneously, a depth-guided appearance modeling module is employed to decouple geometry and appearance, thereby enhancing reconstruction fidelity.
  • Figure 3: Visualization of hybrid multi-view refinement. (a) Strict geometric consistency yields reliable PM-refined depth. (b) and (c) show the restored refined depths, highlighting the effectiveness of patch-based alignment for local restoration.
  • Figure 4: Qualitative comparison on the MatrixCity li2023matrixcity dataset. Image rendering and mesh reconstruction are compared between our method and CityGSV2 liu2024citygaussianv2.
  • Figure 5: Qualitative results on the GauU-Scene xiong2024gauu dataset. We present the image and depth rendering results of our method compared with state-of-the-art methods.
  • ...and 5 more figures