Table of Contents
Fetching ...

WildGaussians: 3D Gaussian Splatting in the Wild

Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, Torsten Sattler

TL;DR

WildGaussians extends 3D Gaussian Splatting to uncontrolled in-the-wild data by integrating per-image and per-Gaussian appearance embeddings and a DINO-based uncertainty predictor. An appearance MLP outputs affine color transforms that condition each Gaussian, while a robust uncertainty loss suppresses occluders during training, enabling accurate rendering under varying illumination and occlusion. The approach preserves real-time rendering and allows appearance to be baked back into the base 3DGS representation, achieving state-of-the-art results on challenging datasets like NeRF On-the-go and Photo Tourism. Sky handling and test-time appearance optimization further enhance robustness, though limitations remain in capturing highlights and extremely occluded regions.

Abstract

While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.

WildGaussians: 3D Gaussian Splatting in the Wild

TL;DR

WildGaussians extends 3D Gaussian Splatting to uncontrolled in-the-wild data by integrating per-image and per-Gaussian appearance embeddings and a DINO-based uncertainty predictor. An appearance MLP outputs affine color transforms that condition each Gaussian, while a robust uncertainty loss suppresses occluders during training, enabling accurate rendering under varying illumination and occlusion. The approach preserves real-time rendering and allows appearance to be baked back into the base 3DGS representation, achieving state-of-the-art results on challenging datasets like NeRF On-the-go and Photo Tourism. Sky handling and test-time appearance optimization further enhance robustness, though limitations remain in capturing highlights and extremely occluded regions.

Abstract

While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.
Paper Structure (18 sections, 11 equations, 10 figures, 5 tables)

This paper contains 18 sections, 11 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: WildGaussians extends 3DGS kerbl20233dgs to scenes with appearance and illumination changes (left). It jointly optimizes a DINO-based Oquab2024TMLR uncertainty predictor to handle occlusions (right).
  • Figure 2: Overview over the core components of WildGaussians.Left: appearance modeling (Sec. \ref{['sec:appearance-modeling']}). Per-Gaussian and per-image embeddings are passed as input to the appearance MLP which outputs the parameters of an affine transformation applied to the Gaussian's view-dependent color. Right: uncertainty modeling (Sec. \ref{['sec:uncertainty-modeling']}). An uncertainty estimate is obtained by a learned transformation of the GT image's DINO features. To train the uncertainty, we use the DINO cosine similarity (dashed lines).
  • Figure 3: Uncertainty Losses Under Appearance Changes. We compare MSE and DSSIM uncertainty losses (used by NeRF-W Martin2021CVPR and NeRF On-the-goRen2024CVPR) to our DINO cosine similarity loss. Under heavy appearance changes (as in Image 1 and 2), both MSE and DSSIM fail to focus on the occluder (humans) and falsely downweight the background, while partly ignoring the occluders.
  • Figure 4: Comparison on NeRF On-the-go Dataset Ren2024CVPR. For both the Fountain and Patio-High scenes, we can see that the baseline methods exhibit different levels of artifacts in the rendering, while our method removes all occluders and shows the best view synthesis results.
  • Figure 5: Comparison on the Photo Tourism Dataset Snavely2006TOG. In the first row, note that while none of the methods can represent the reflections and details of the flowing water, 3DGS and WildGaussians can provide at least some details even though there are no multiview constraints on the flowing water. On the second row, notice how 3DGS tries to 'simulate' darkness by placing dark - semi-transparent Gaussians in front of the cameras. For WildGaussians, the text on the building is legible. WildGaussians is able to recover fine details in the last row.
  • ...and 5 more figures