Table of Contents
Fetching ...

Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors

Paul Ungermann, Armin Ettenhofer, Matthias Nießner, Barbara Roessle

TL;DR

This paper addresses the vulnerability of 3D Gaussian Splatting to distractors in multi-view data by introducing a distractor-aware optimization framework. It combines self-supervised residual-based masking with a learnable neural decision boundary and object-aware masking derived from Segment Anything to ignore distractors during 3D Gaussian optimization. The approach yields significant PSNR gains over baseline Gaussian Splatting and RobustNeRF while preserving quality on clean scenes, and it robustly handles diverse distractors with minimal runtime impact from the segmentation step. Practically, this enables more reliable novel view synthesis in real-world, cluttered environments where dynamic objects frequently appear in training data. The combination of residual-based masking, neural boundary learning, and SAM-based object awareness constitutes a versatile strategy for robust 3D reconstruction from imperfect input data.

Abstract

3D Gaussian Splatting has shown impressive novel view synthesis results; nonetheless, it is vulnerable to dynamic objects polluting the input data of an otherwise static scene, so called distractors. Distractors have severe impact on the rendering quality as they get represented as view-dependent effects or result in floating artifacts. Our goal is to identify and ignore such distractors during the 3D Gaussian optimization to obtain a clean reconstruction. To this end, we take a self-supervised approach that looks at the image residuals during the optimization to determine areas that have likely been falsified by a distractor. In addition, we leverage a pretrained segmentation network to provide object awareness, enabling more accurate exclusion of distractors. This way, we obtain segmentation masks of distractors to effectively ignore them in the loss formulation. We demonstrate that our approach is robust to various distractors and strongly improves rendering quality on distractor-polluted scenes, improving PSNR by 1.86dB compared to 3D Gaussian Splatting.

Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors

TL;DR

This paper addresses the vulnerability of 3D Gaussian Splatting to distractors in multi-view data by introducing a distractor-aware optimization framework. It combines self-supervised residual-based masking with a learnable neural decision boundary and object-aware masking derived from Segment Anything to ignore distractors during 3D Gaussian optimization. The approach yields significant PSNR gains over baseline Gaussian Splatting and RobustNeRF while preserving quality on clean scenes, and it robustly handles diverse distractors with minimal runtime impact from the segmentation step. Practically, this enables more reliable novel view synthesis in real-world, cluttered environments where dynamic objects frequently appear in training data. The combination of residual-based masking, neural boundary learning, and SAM-based object awareness constitutes a versatile strategy for robust 3D reconstruction from imperfect input data.

Abstract

3D Gaussian Splatting has shown impressive novel view synthesis results; nonetheless, it is vulnerable to dynamic objects polluting the input data of an otherwise static scene, so called distractors. Distractors have severe impact on the rendering quality as they get represented as view-dependent effects or result in floating artifacts. Our goal is to identify and ignore such distractors during the 3D Gaussian optimization to obtain a clean reconstruction. To this end, we take a self-supervised approach that looks at the image residuals during the optimization to determine areas that have likely been falsified by a distractor. In addition, we leverage a pretrained segmentation network to provide object awareness, enabling more accurate exclusion of distractors. This way, we obtain segmentation masks of distractors to effectively ignore them in the loss formulation. We demonstrate that our approach is robust to various distractors and strongly improves rendering quality on distractor-polluted scenes, improving PSNR by 1.86dB compared to 3D Gaussian Splatting.
Paper Structure (21 sections, 10 equations, 10 figures, 3 tables)

This paper contains 21 sections, 10 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Due to distractors in the scene 3D Gaussian Splatting creates floating artifacts in the image (highlighted with circles). Our method mitigates artifacts due to violations of the static scene assumption for Gaussian Splatting. As a key element to our approach, we optimize for semantic distractor masks simultaneous to the scene optimization, which allow us to effectively ignore distractors.
  • Figure 2: Given the ground truth (\ref{['fig:c_gound_truth']}) and the rendered image we can calculate the raw distractor mask (see \ref{['eq:log_reg']}). Next, we intersect the raw distractor mask with the object masks from SegmentAnything (\ref{['fig:c_segmentation']}). The intersected mask (\ref{['fig:c_intersection']}) is then used in the loss.
  • Figure 3: The first step is to calculate the residuals using the ground truth image and the rendered image. Next, we compute the mask as described in \ref{['eq:log_reg']} using the neural decision boundary. Then, we calculate the mask loss from \ref{['eq:mask_loss']} using the computed mask and propagate back to learn the logistic regression. After that, we intersect the mask with the segmentation masks from Segment Anything and round the masks. The ground truth and the rendered image are multiplied element-wise with the distractor mask and used in the Gaussian Splatting training.
  • Figure 4: Example comparison of qualitative results for all scenes from held-out test views. Robust Gaussian Splatting is most effective in ignoring distractors while maintaining a good background and general image quality. For more baseline comparisons see the supplementary material.
  • Figure 5: Example comparison of all ablations. The background of w/o Neural is poor, but it filters the distractors efficiently. The w/o Segmentation version has a good background, but fails to remove all distractors and artifacts, resulting in blurred parts. We can see that our full version is most effective at ignoring distractors. Further ablation comparisons are provided in the supplementary material.
  • ...and 5 more figures