Table of Contents
Fetching ...

RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields

Mihnea-Bogdan Jurca, Remco Royen, Ion Giosan, Adrian Munteanu

TL;DR

RT-GS2 tackles generalizable semantic segmentation for 3D Gaussian Splatting representations of radiance fields. It proposes a three-stage framework: a self-supervised view-independent 3D Gaussian feature extractor trained on full Gaussian scenes, a rendering step to produce view-specific information, and a View-Dependent / View-Independent (VDVI) fusion module that combines multi-scale features to generate semantic maps. The approach achieves state-of-the-art generalization on Replica and ScanNet while delivering real-time performance at 27.03 FPS, yielding up to a 901x speedup over prior methods and enabling practical downstream use. Ablation studies confirm the benefits of view-independent 3D features, the VDVI fusion, and the specialized loss terms for generalization, with depth-prediction results demonstrating robustness of the learned 3D features. Overall, RT-GS2 represents a significant step toward real-time, scene-generalizable semantic understanding in Gaussian-based radiance-field representations.

Abstract

Gaussian Splatting has revolutionized the world of novel view synthesis by achieving high rendering performance in real-time. Recently, studies have focused on enriching these 3D representations with semantic information for downstream tasks. In this paper, we introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. While existing Gaussian Splatting-based approaches rely on scene-specific training, RT-GS2 demonstrates the ability to generalize to unseen scenes. Our method adopts a new approach by first extracting view-independent 3D Gaussian features in a self-supervised manner, followed by a novel View-Dependent / View-Independent (VDVI) feature fusion to enhance semantic consistency over different views. Extensive experimentation on three different datasets showcases RT-GS2's superiority over the state-of-the-art methods in semantic segmentation quality, exemplified by a 8.01% increase in mIoU on the Replica dataset. Moreover, our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches. This work represents a significant advancement in the field by introducing, to the best of our knowledge, the first real-time generalizable semantic segmentation method for 3D Gaussian representations of radiance fields.

RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields

TL;DR

RT-GS2 tackles generalizable semantic segmentation for 3D Gaussian Splatting representations of radiance fields. It proposes a three-stage framework: a self-supervised view-independent 3D Gaussian feature extractor trained on full Gaussian scenes, a rendering step to produce view-specific information, and a View-Dependent / View-Independent (VDVI) fusion module that combines multi-scale features to generate semantic maps. The approach achieves state-of-the-art generalization on Replica and ScanNet while delivering real-time performance at 27.03 FPS, yielding up to a 901x speedup over prior methods and enabling practical downstream use. Ablation studies confirm the benefits of view-independent 3D features, the VDVI fusion, and the specialized loss terms for generalization, with depth-prediction results demonstrating robustness of the learned 3D features. Overall, RT-GS2 represents a significant step toward real-time, scene-generalizable semantic understanding in Gaussian-based radiance-field representations.

Abstract

Gaussian Splatting has revolutionized the world of novel view synthesis by achieving high rendering performance in real-time. Recently, studies have focused on enriching these 3D representations with semantic information for downstream tasks. In this paper, we introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. While existing Gaussian Splatting-based approaches rely on scene-specific training, RT-GS2 demonstrates the ability to generalize to unseen scenes. Our method adopts a new approach by first extracting view-independent 3D Gaussian features in a self-supervised manner, followed by a novel View-Dependent / View-Independent (VDVI) feature fusion to enhance semantic consistency over different views. Extensive experimentation on three different datasets showcases RT-GS2's superiority over the state-of-the-art methods in semantic segmentation quality, exemplified by a 8.01% increase in mIoU on the Replica dataset. Moreover, our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches. This work represents a significant advancement in the field by introducing, to the best of our knowledge, the first real-time generalizable semantic segmentation method for 3D Gaussian representations of radiance fields.
Paper Structure (24 sections, 9 equations, 8 figures, 6 tables)

This paper contains 24 sections, 9 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Visualization of the enhanced view-consistency throughout subsequent frames. (Top) Visualization of our view-independent 3D features using PCA, (middle) semantic segmentation without the usage of view-independent 3D features, and (bottom) proposed semantic segmentation when using our view-independent 3D features.
  • Figure 2: Overview of the proposed method.
  • Figure 3: Qualitative results on Replica and ScanNet. The table presents generalizable (gen.) and finetuning (ft.) results on Replica (first two rows) and ScanNet (last two rows) for both rendering (rend.) and semantic segmentation (sem.). Comparisons between RT-GS2 and Semantic-Rayliu2023semantic are made.
  • Figure 4: Depth prediction of the proposed method on Replica dataset.
  • Figure 5: Additional qualitative results of RT-GS2 on the Replica straub2019replica dataset.
  • ...and 3 more figures