Table of Contents
Fetching ...

Distractor-free Generalizable 3D Gaussian Splatting

Yanqi Bao, Jing Liao, Jing Huo, Yang Gao

TL;DR

DGGS mitigates 3D inconsistency and training instability caused by distractor data in the cross-scenes generalizable train setting while enabling feedforward inference for 3DGS and distractor masks from references in the unseen scenes while enabling feedforward inference for 3DGS and distractor masks from references in the unseen scenes.

Abstract

We present DGGS, a novel framework that addresses the previously unexplored challenge: $\textbf{Distractor-free Generalizable 3D Gaussian Splatting}$ (3DGS). It mitigates 3D inconsistency and training instability caused by distractor data in the cross-scenes generalizable train setting while enabling feedforward inference for 3DGS and distractor masks from references in the unseen scenes. To achieve these objectives, DGGS proposes a scene-agnostic reference-based mask prediction and refinement module during the training phase, effectively eliminating the impact of distractor on training stability. Moreover, we combat distractor-induced artifacts and holes at inference time through a novel two-stage inference framework for references scoring and re-selection, complemented by a distractor pruning mechanism that further removes residual distractor 3DGS-primitive influences. Extensive feedforward experiments on the real and our synthetic data show DGGS's reconstruction capability when dealing with novel distractor scenes. Moreover, our generalizable mask prediction even achieves an accuracy superior to existing scene-specific training methods. Homepage is https://github.com/bbbbby-99/DGGS.

Distractor-free Generalizable 3D Gaussian Splatting

TL;DR

DGGS mitigates 3D inconsistency and training instability caused by distractor data in the cross-scenes generalizable train setting while enabling feedforward inference for 3DGS and distractor masks from references in the unseen scenes while enabling feedforward inference for 3DGS and distractor masks from references in the unseen scenes.

Abstract

We present DGGS, a novel framework that addresses the previously unexplored challenge: (3DGS). It mitigates 3D inconsistency and training instability caused by distractor data in the cross-scenes generalizable train setting while enabling feedforward inference for 3DGS and distractor masks from references in the unseen scenes. To achieve these objectives, DGGS proposes a scene-agnostic reference-based mask prediction and refinement module during the training phase, effectively eliminating the impact of distractor on training stability. Moreover, we combat distractor-induced artifacts and holes at inference time through a novel two-stage inference framework for references scoring and re-selection, complemented by a distractor pruning mechanism that further removes residual distractor 3DGS-primitive influences. Extensive feedforward experiments on the real and our synthetic data show DGGS's reconstruction capability when dealing with novel distractor scenes. Moreover, our generalizable mask prediction even achieves an accuracy superior to existing scene-specific training methods. Homepage is https://github.com/bbbbby-99/DGGS.

Paper Structure

This paper contains 47 sections, 9 equations, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Overview of Our Task.Distractors are unwanted transient objects in static scene reconstruction, such as buses, balloons, or anything. DGGS enables feed-forward 3DGS reconstruction from limited distractor data while inferring corresponding distractor masks without extra supervision.
  • Figure 2: Distractor-free Generalizable Training. Based on the sampled reference-query pairs, DGGS first predicts 3DGS attributes and a fundamental robust mask $\mathcal{M}_{Rob}$. The Reference-based Mask Prediction module then filters this mask, which is further refined through the Mask Refinement module. The entire process is supervised through masked query loss and auxiliary loss.
  • Figure 3: The Mask Evolution in Sec. \ref{['4.1']}.$\mathcal{M}_{Q}$ is obtained by filtering $\mathcal{M}_{Rob}$ from the references non-distractor regions, which is then filled by decoupling $\mathcal{M}_{D}$ and using segmentation results to get final $\mathcal{M}$ as Eq. \ref{['eq5']}\ref{['eq10']}\ref{['eq11']}. Without references filter, target regions are often misidentified as distractor.
  • Figure 4: Distractor-free Generalizable Inference Framework. DGGS initially samples adjacent references from the scene-images pool and leverages trained DGGS for coarse 3DGS. Based on the Reference Scoring mechanism, masks and quality scores are computed for all pool images. These masks and scores subsequently guide reference selection and Distractor Pruning for fine 3DGS.
  • Figure 5: Qualitative Comparison of Re-trained Existing Methods across unseen scenes.
  • ...and 11 more figures