SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain
Butian Xiong, Xiaoyu Ye, Tze Ho Elden Tse, Kai Han, Shuguang Cui, Zhen Li
TL;DR
This work addresses large-scale 3D scene reconstruction by incorporating semantic information into Gaussian Splatting to mitigate fantasy-surface and inconsistency issues. It introduces SA-GS, a three-stage pipeline that (i) generates semantic masks and derives per-semantic-group target shapes guided by a geometric-complexity prior, (ii) enforces a soft, per-Gaussian regularizer during training, and (iii) uses a hierarchical probability-density sampling to extract a geometry-faithful point cloud. Key contributions include a frequency-based perplexity measure $\mathbf{P_j}$ that bounds per-semantic-group splat counts, a soft regularization loss $\mathcal{L}_{gc}$, and a density-based point extraction scheme with $\phi(x)$, all aimed at preserving semantic detail while controlling memory use. Experiments on GauUscene-based datasets and a campus dataset demonstrate significant improvements in geometric metrics over state-of-the-art Gaussian splat methods and competitive image-based render quality. The approach enables detailed semantic queries and improved geometry, with practical impact for large-scale scene understanding and downstream tasks, while noting limitations related to semantic consistency and reliance on external masks.
Abstract
With the emergence of Gaussian Splats, recent efforts have focused on large-scale scene geometric reconstruction. However, most of these efforts either concentrate on memory reduction or spatial space division, neglecting information in the semantic space. In this paper, we propose a novel method, named SA-GS, for fine-grained 3D geometry reconstruction using semantic-aware 3D Gaussian Splats. Specifically, we leverage prior information stored in large vision models such as SAM and DINO to generate semantic masks. We then introduce a geometric complexity measurement function to serve as soft regularization, guiding the shape of each Gaussian Splat within specific semantic areas. Additionally, we present a method that estimates the expected number of Gaussian Splats in different semantic areas, effectively providing a lower bound for Gaussian Splats in these areas. Subsequently, we extract the point cloud using a novel probability density-based extraction method, transforming Gaussian Splats into a point cloud crucial for downstream tasks. Our method also offers the potential for detailed semantic inquiries while maintaining high image-based reconstruction results. We provide extensive experiments on publicly available large-scale scene reconstruction datasets with highly accurate point clouds as ground truth and our novel dataset. Our results demonstrate the superiority of our method over current state-of-the-art Gaussian Splats reconstruction methods by a significant margin in terms of geometric-based measurement metrics. Code and additional results will soon be available on our project page.
