Table of Contents
Fetching ...

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering

Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, Arno Solin

TL;DR

DeSplat addresses distractor-induced breakdowns in 3D Gaussian Splatting by introducing an explicit decomposition into static Gaussians $\mathcal{G}_s$ and per-view distractor Gaussians $\mathcal{G}_d$. It renders two components with $\mathbf{c}_{comp} = \mathbf{c}_{d} + (1 - \alpha_{d}) \mathbf{c}_{s}$ and optimizes via a photometric loss, enabling clear scene separation without external semantic priors. Key contributions include a pure splatting-based framework with Adaptive Density Control and regularization that yields competitive distractor-free reconstructions across RobustNeRF, On-the-go, and Photo Tourism datasets, while preserving fast rendering. The approach is compatible with appearance and background modelling, offering a practical path toward robust, distractor-free 3D reconstructions from unstructured image collections. This work advances reliable 3D scene reconstructions in the presence of transient occluders and demonstrates broad applicability to real-world data.

Abstract

Gaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate 3D reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead as pre-processing steps or during optimization. In this work, we propose a novel method, DeSplat, that directly separates distractors and static scene elements purely based on volume rendering of Gaussian primitives. We initialize Gaussians within each camera view for reconstructing the view-specific distractors to separately model the static 3D scene and distractors in the alpha compositing stages. DeSplat yields an explicit scene separation of static elements and distractors, achieving comparable results to prior distractor-free approaches without sacrificing rendering speed. We demonstrate DeSplat's effectiveness on three benchmark data sets for distractor-free novel view synthesis. See the project website at https://aaltoml.github.io/desplat/.

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering

TL;DR

DeSplat addresses distractor-induced breakdowns in 3D Gaussian Splatting by introducing an explicit decomposition into static Gaussians and per-view distractor Gaussians . It renders two components with and optimizes via a photometric loss, enabling clear scene separation without external semantic priors. Key contributions include a pure splatting-based framework with Adaptive Density Control and regularization that yields competitive distractor-free reconstructions across RobustNeRF, On-the-go, and Photo Tourism datasets, while preserving fast rendering. The approach is compatible with appearance and background modelling, offering a practical path toward robust, distractor-free 3D reconstructions from unstructured image collections. This work advances reliable 3D scene reconstructions in the presence of transient occluders and demonstrates broad applicability to real-world data.

Abstract

Gaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate 3D reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead as pre-processing steps or during optimization. In this work, we propose a novel method, DeSplat, that directly separates distractors and static scene elements purely based on volume rendering of Gaussian primitives. We initialize Gaussians within each camera view for reconstructing the view-specific distractors to separately model the static 3D scene and distractors in the alpha compositing stages. DeSplat yields an explicit scene separation of static elements and distractors, achieving comparable results to prior distractor-free approaches without sacrificing rendering speed. We demonstrate DeSplat's effectiveness on three benchmark data sets for distractor-free novel view synthesis. See the project website at https://aaltoml.github.io/desplat/.

Paper Structure

This paper contains 47 sections, 10 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: DeSplat: Gaussian Splatting struggles with floaters and artifacts when image sequences violate photometric consistency assumptions. Unlike existing distractor-free methods which rely on external semantic features, we propose a fully splatting-based solution grounded in photometric consistency that decomposes 3DGS scenes into static components and per-view distractors.
  • Figure 2: Qualitative visualization of static and distractor elements achieved by our method, DeSplat (\ref{['sec:decomposed_3dgs']}). In the Yoda and Crab (2) scenes sabour2023robustnerf, both clean and cluttered images are captured from the same viewpoints. By explicitly modelling the scene using static and distractor Gaussians, our approach enables clear distractor segmentation and reduced artifacts compared to the Splatfacto baseline nerfstudio.
  • Figure 3: Method overview of DeSplat: We decompose 3DGS to model the static scene and per-view distractors explicitly. The static scene $\mathcal{G}_{s}$ is optimized for all camera views but we allow learning of per-view distractor Gaussians $\mathcal{G}_{d}$ to model spurious transient effects which are jointly optimized with the static scene via alpha-compositing. We show how this formulation allows implicit learning of distractor segmentation masks and decomposition of the 3DGS scene into static and distractor elements.
  • Figure 4: Handling occluders. Examples of static and transient components alongside the composited images. The transient components exhibit well-defined boundaries, while the quality of the static scene is preserved. This showcases the ability of our model to learn transient components without requiring semantic supervision or external priors from pre-trained networks.
  • Figure 5: Qualitative results on the RobustNeRF data set sabour2023robustnerf. In the Android and Statue scenes, DeSplat generates fewer artefacts than Splatfacto and reconstructs static objects and backgrounds accurately. More qualitative examples in \ref{['fig:qualitative-robustnerf-all']} in the Appendix.
  • ...and 9 more figures