Table of Contents
Fetching ...

PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping

Tengfei Wang, Xin Wang, Yongmao Hou, Yiwei Xu, Wendi Zhang, Zongqian Zhan

TL;DR

This work tackles the challenge of fine-grained building reconstruction in large-scale urban scenes with the memory- and time-intensive 3D Gaussian Splatting framework. It introduces PG-SAG, a semantic-aware grouping approach that partitions large scenes into subgroups using Language Segment Anything, enabling parallel optimization on high-resolution inputs without downsampling. Two novel losses—boundary-aware normal loss and gradient-constrained balance-load loss—address edge ambiguities and load balancing, improving reconstruction fidelity while reducing training time. Experimental results on GauU-Scene and DPCV demonstrate superior building surface reconstruction quality compared to state-of-the-art 3DGS methods and commercial tools, highlighting practical impact for scalable urban modeling and digital city applications, with code available at the provided repository.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a transformative method in the field of real-time novel synthesis. Based on 3DGS, recent advancements cope with large-scale scenes via spatial-based partition strategy to reduce video memory and optimization time costs. In this work, we introduce a parallel Gaussian splatting method, termed PG-SAG, which fully exploits semantic cues for both partitioning and Gaussian kernel optimization, enabling fine-grained building surface reconstruction of large-scale urban areas without downsampling the original image resolution. First, the Cross-modal model - Language Segment Anything is leveraged to segment building masks. Then, the segmented building regions is grouped into sub-regions according to the visibility check across registered images. The Gaussian kernels for these sub-regions are optimized in parallel with masked pixels. In addition, the normal loss is re-formulated for the detected edges of masks to alleviate the ambiguities in normal vectors on edges. Finally, to improve the optimization of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts for the complexity of the corresponding scenes, effectively minimizing the thread waiting time in the pixel-parallel rendering stage as well as the reconstruction lost. Extensive experiments are tested on various urban datasets, the results demonstrated the superior performance of our PG-SAG on building surface reconstruction, compared to several state-of-the-art 3DGS-based methods. Project Web:https://github.com/TFWang-9527/PG-SAG.

PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping

TL;DR

This work tackles the challenge of fine-grained building reconstruction in large-scale urban scenes with the memory- and time-intensive 3D Gaussian Splatting framework. It introduces PG-SAG, a semantic-aware grouping approach that partitions large scenes into subgroups using Language Segment Anything, enabling parallel optimization on high-resolution inputs without downsampling. Two novel losses—boundary-aware normal loss and gradient-constrained balance-load loss—address edge ambiguities and load balancing, improving reconstruction fidelity while reducing training time. Experimental results on GauU-Scene and DPCV demonstrate superior building surface reconstruction quality compared to state-of-the-art 3DGS methods and commercial tools, highlighting practical impact for scalable urban modeling and digital city applications, with code available at the provided repository.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a transformative method in the field of real-time novel synthesis. Based on 3DGS, recent advancements cope with large-scale scenes via spatial-based partition strategy to reduce video memory and optimization time costs. In this work, we introduce a parallel Gaussian splatting method, termed PG-SAG, which fully exploits semantic cues for both partitioning and Gaussian kernel optimization, enabling fine-grained building surface reconstruction of large-scale urban areas without downsampling the original image resolution. First, the Cross-modal model - Language Segment Anything is leveraged to segment building masks. Then, the segmented building regions is grouped into sub-regions according to the visibility check across registered images. The Gaussian kernels for these sub-regions are optimized in parallel with masked pixels. In addition, the normal loss is re-formulated for the detected edges of masks to alleviate the ambiguities in normal vectors on edges. Finally, to improve the optimization of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts for the complexity of the corresponding scenes, effectively minimizing the thread waiting time in the pixel-parallel rendering stage as well as the reconstruction lost. Extensive experiments are tested on various urban datasets, the results demonstrated the superior performance of our PG-SAG on building surface reconstruction, compared to several state-of-the-art 3DGS-based methods. Project Web:https://github.com/TFWang-9527/PG-SAG.
Paper Structure (23 sections, 11 equations, 9 figures, 2 tables)

This paper contains 23 sections, 11 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Overall surface reconstruction results on the DPCV dataset, along with comparisons to other methods and our method using high-resolution images.Our PG-SAG with original resolution generates the most detailed meshes. Moreover, comparing to others of lower resolution, we again clearly perform better.
  • Figure 2: Incorrect building meshes using PGSRpgsr. Due to the interference from the background (non-building areas) on the foreground (building areas) when optimizing 3D Gaussians, erroneous reconstruction of building edges are produced, as shown by the highlighted details within the red boxes.
  • Figure 3: Semantic-Aware Data Grouping Pipeline. The top-left part shows the coarse masks of buildings within the input images using LSA. The top-right parts illustrates a multi-view voting filtering, only points with high confidence, appearing in multiple building masks, are retained. The bottom part, from right to left, involves the usage of pre-trained Gaussian point-assisted point cloud instance segmentation, followed by reprojection to obtain mask points. In the final step, SAM2 is applied to extract refine building masks.
  • Figure 4: Comparison of different segmentation methods. LSA (lang-segment-anything) confuses the ground with buildings, resulting in inaccurate masks. PG-SAG can not only obtain complete building masks, but also obtain fine boundaries.
  • Figure 5: Sample images from Russian Building, Modern Building and DPCV Dataset.
  • ...and 4 more figures