Table of Contents
Fetching ...

Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections

Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang

TL;DR

GS-W introduces a per-point 3D Gaussian splatting framework that separates intrinsic and dynamic appearance at each point, enabling robust novel view synthesis from unconstrained image collections. It adds adaptive sampling across multiple 2D feature maps and a 2D visibility map to handle dynamic lighting, weather, and transient occluders, while maintaining real-time rendering via a tile-based rasterizer. The method outperforms NeRF-based baselines in both rendering quality (PSNR/SSIM/LPIPS) and speed (≈200 FPS, over 1000× faster than some NeRF methods), and ablations confirm the importance of per-point dynamics, sampling, and transient handling. This approach advances unconstrained view synthesis by combining explicit 3D Gaussian representations with disentangled appearance modeling and efficient rendering pipelines, offering flexible appearance tuning and strong generalization across diverse scenes.

Abstract

Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to NeRF-based methods, with a faster rendering speed. Video results and code are available at https://eastbeanzhang.github.io/GS-W/.

Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections

TL;DR

GS-W introduces a per-point 3D Gaussian splatting framework that separates intrinsic and dynamic appearance at each point, enabling robust novel view synthesis from unconstrained image collections. It adds adaptive sampling across multiple 2D feature maps and a 2D visibility map to handle dynamic lighting, weather, and transient occluders, while maintaining real-time rendering via a tile-based rasterizer. The method outperforms NeRF-based baselines in both rendering quality (PSNR/SSIM/LPIPS) and speed (≈200 FPS, over 1000× faster than some NeRF methods), and ablations confirm the importance of per-point dynamics, sampling, and transient handling. This approach advances unconstrained view synthesis by combining explicit 3D Gaussian representations with disentangled appearance modeling and efficient rendering pipelines, offering flexible appearance tuning and strong generalization across diverse scenes.

Abstract

Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to NeRF-based methods, with a faster rendering speed. Video results and code are available at https://eastbeanzhang.github.io/GS-W/.
Paper Structure (29 sections, 14 equations, 10 figures, 7 tables)

This paper contains 29 sections, 14 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: With an unconstrained image collection input, GS-W can render novel views with appearance tuning, achieving state-of-the-art quality and faster rendering speed.
  • Figure 1: More qualitative comparison results on Brandenburg. Our method captures local highlights and avoids artifacts caused by transient objects, as observed in K-Planes and 3DGS.
  • Figure 2: An overview of the GS-W framework. We begin with a scene's reference image and its camera pose $P$. After extracting image features via a Unet model, we reshape them into K feature maps and one projection feature map. Each Gaussian point $GP_i$ then samples features from these maps adaptively, capturing dynamic appearance feature $df_i$. These features are fused with the intrinsic appearance feature $sf_i$ through a fusion network, decoded for Gaussian point color $c_i$. Finally, all Gaussian points are rendered using a tile rasterizer.
  • Figure 2: More qualitative comparison experiments on appearance tuning. Similar to \ref{['fig4']}, images are rendered at the same camera pose with increasing weight of features extracted from the image. Our method not only captures environmental information better but also naturally adjusts its influence on the scene.
  • Figure 3: Qualitative results on the test set of three PhotoTourism scenes. GS-W recovers finer details of appearance(e.g. the horse sculpture in Brandenburg, the sky and clouds in Sacre, the light on columns, and the color of windows in Trevi). Moreover, GS-W reconstructs more consistent and detailed scenes (e.g. the distant tower in Brandenburg, the cavities in Sacre, and the distant building in Trevi).
  • ...and 5 more figures