Table of Contents
Fetching ...

Semantics-Controlled Gaussian Splatting for Outdoor Scene Reconstruction and Rendering in Virtual Reality

Hannah Schieber, Jacob Young, Tobias Langlotz, Stefanie Zollmann, Daniel Roth

TL;DR

Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments is proposed, and it outperforms the state-of-the-art in visual quality on the authors' dataset and in segmentation quality on the 3D-OVS dataset.

Abstract

Advancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancement or to incorporate 3D assets, segmenting Gaussians by class is essential. Existing segmentation approaches are typically limited to certain types of scenes, e.g., ''circular'' scenes, to determine clear object boundaries. However, this method is ineffective when removing large objects in non-''circling'' scenes such as large outdoor scenes. We propose Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments. SCGS allows scene editing and the extraction of scene parts for VR. Additionally, we introduce a challenging outdoor dataset, overcoming the ''circling'' setup. We outperform the state-of-the-art in visual quality on our dataset and in segmentation quality on the 3D-OVS dataset. We conducted an exploratory user study, comparing a 360-video, plain GS, and SCGS in VR with a fixed viewpoint. In our subsequent main study, users were allowed to move freely, evaluating plain GS and SCGS. Our main study results show that participants clearly prefer SCGS over plain GS. We overall present an innovative approach that surpasses the state-of-the-art both technically and in user experience.

Semantics-Controlled Gaussian Splatting for Outdoor Scene Reconstruction and Rendering in Virtual Reality

TL;DR

Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments is proposed, and it outperforms the state-of-the-art in visual quality on the authors' dataset and in segmentation quality on the 3D-OVS dataset.

Abstract

Advancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancement or to incorporate 3D assets, segmenting Gaussians by class is essential. Existing segmentation approaches are typically limited to certain types of scenes, e.g., ''circular'' scenes, to determine clear object boundaries. However, this method is ineffective when removing large objects in non-''circling'' scenes such as large outdoor scenes. We propose Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments. SCGS allows scene editing and the extraction of scene parts for VR. Additionally, we introduce a challenging outdoor dataset, overcoming the ''circling'' setup. We outperform the state-of-the-art in visual quality on our dataset and in segmentation quality on the 3D-OVS dataset. We conducted an exploratory user study, comparing a 360-video, plain GS, and SCGS in VR with a fixed viewpoint. In our subsequent main study, users were allowed to move freely, evaluating plain GS and SCGS. Our main study results show that participants clearly prefer SCGS over plain GS. We overall present an innovative approach that surpasses the state-of-the-art both technically and in user experience.
Paper Structure (41 sections, 5 equations, 9 figures, 5 tables)

This paper contains 41 sections, 5 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Architecture of SCGS. We first extract images from our continuous panoramic stream. Using COLMAP to estimate the camera positions, we obtain the sparse point cloud for the initialization of gs. To enable 3D filtering, the data is preprocessed with a segmentation model. During 3D Gaussian training, we use ce-loss, L1-loss and ssim-loss to fit our scene into the RGB and segmentation space. The final 3D representation can be viewed in the viewer, or individual parts of the scene be extracted and used in vr.
  • Figure 2: Images of our dataset. Tree, Open Sea, Picnic, Outback, and Kayak (from left to right).
  • Figure 3: Example comparison of Gaussian Grouping, Gaussian Grouping (improved labels) and our approach. All scenes contain water, sky and vegetation. The hiking sequence (top), shows outliers on the mountain (transparency), the kayak outback scene (center) shows the challenges of the water and the kayak scene (bottom) shows the challenges of the closed stacked trees.
  • Figure 4: Class removal on our dataset. The convex hull removes to much of the scene (left), using the same direct removal (center) as in our case, leads to more outliers, compared to our approach (right).
  • Figure 5: Use Cases. Our approach can be applied to various cases of large-scale scene removal/editing: Sky replacement (top row, second, bottom row), scenes outside of our dataset like sport fields or fountains (second row), or smaller objects like brick cars (fourth row). We display images from the original capturing (left), the removed class in black (center), and the Game Engine enriched scene (right) from a novel viewpoint.
  • ...and 4 more figures