Table of Contents
Fetching ...

RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS

Chuanyu Fu, Guanying Chen, Yuqi Zhang, Kunbin Yao, Yuan Xiong, Chuan Huang, Shuguang Cui, Yasuyuki Matsushita, Xiaochun Cao

TL;DR

This work identifies Gaussian densification in 3D Gaussian Splatting as a key source of artifacts when handling transient objects and illumination changes in unconstrained scenes. It introduces RobustSplat++, combining delayed Gaussian growth, scale-cascaded mask bootstrapping, and appearance modeling to suppress transient effects and model lighting variations, respectively. Extensive experiments across multiple challenging datasets demonstrate superior robustness and rendering quality compared with state-of-the-art methods. The approach enables more reliable in-the-wild 3DGS for novel-view synthesis and modeling, with practical implications for real-world capture and rendering scenarios.

Abstract

3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling in-the-wild scenes affected by transient objects and illuminations, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances and illumination variations. To address this, we propose RobustSplat++, a robust solution based on several critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Third, we incorporate the delayed Gaussian growth strategy and mask bootstrapping with appearance modeling to handling in-the-wild scenes including transients and illuminations. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method.

RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS

TL;DR

This work identifies Gaussian densification in 3D Gaussian Splatting as a key source of artifacts when handling transient objects and illumination changes in unconstrained scenes. It introduces RobustSplat++, combining delayed Gaussian growth, scale-cascaded mask bootstrapping, and appearance modeling to suppress transient effects and model lighting variations, respectively. Extensive experiments across multiple challenging datasets demonstrate superior robustness and rendering quality compared with state-of-the-art methods. The approach enables more reliable in-the-wild 3DGS for novel-view synthesis and modeling, with practical implications for real-world capture and rendering scenarios.

Abstract

3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling in-the-wild scenes affected by transient objects and illuminations, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances and illumination variations. To address this, we propose RobustSplat++, a robust solution based on several critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Third, we incorporate the delayed Gaussian growth strategy and mask bootstrapping with appearance modeling to handling in-the-wild scenes including transients and illuminations. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method.

Paper Structure

This paper contains 53 sections, 12 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: We propose a robust solution, RobustSplat++, to handle 3DGS optimization in in-the-wild scenes. Our method can effectively handle transient distractors alone or in combination with illumination variations, yielding clean and more reliable results.
  • Figure 2: Analysis of Gaussian densification in transient object fitting. As training progresses, vanilla 3DGS kerbl20233d suffers from performance degradation and exhibits artifacts due to the increasing number of Gaussians. Disabling Gaussian densification notably improves the results, even achieving performance comparable to the recent robust method SpotLessSplats sabour2024spotlesssplats. Despite producing transient-free rendering, 3DGS w/o densification struggles to recover fine details in regions with sparse Gaussian initialization (highlighted by red arrows).
  • Figure 3: Overview of the proposed method. The main reconstruction pipeline employs 3DGS with Delayed Gaussian Growth, generating rendered images that are optimized with a masked reconstruction loss. The method handles two types of in-the-wild inputs: (1) Images with transient distractors, where a Mask Prediction Branch predicts per-pixel masks to guide transient suppression, with the masks supervised by Scale-cascaded Mask Bootstrapping; (2) Images with both transients and illumination variations, where an Appearance Modeling Branch predicts affine coefficients from 2D, 3D embedding, and original Gaussian colors to modulate the affine Gaussian colors.
  • Figure 4: Visualization of DINOv2, SAM2, and SD features via PCA. The last row compares the cosine similarity maps between features of the ground-truth and rendered image.
  • Figure 5: Effects of start iteration of Gaussian densification with and without the transient mask learning.
  • ...and 7 more figures