Table of Contents
Fetching ...

Two-Stage Gaussian Splatting Optimization for Outdoor Scene Reconstruction

Deborah Pintani, Ariel Caputo, Noah Lewis, Marc Stamminger, Fabio Pellacini, Andrea Giachetti

TL;DR

This work tackles outdoor scene reconstruction with large background regions by introducing a two-stage Gaussian Splatting pipeline that explicitly separates background and foreground using two concentric shells. Stage 1 models the background on an outer spherical shell with $L_{shell}$ and $L_{planarity}$ losses and a visibility-based pruning strategy, while Stage 2 adds the foreground within the inner region using the standard GS loss and boundary pruning. The approach yields cleaner background representations, reduces floaters, and enables automatic environment-map generation from the background, with quantitative gains over baselines across five outdoor datasets and practical implications for VR and mixed-reality rendering. The method advances outdoor GS by stabilizing distant-region reconstruction and facilitating photorealistic environment maps, at the cost of extended training time which does not impact runtime rendering performance. $L_{shell}$ and $L_{planarity}$ play key roles in preserving stable background geometry and mitigating radial artifacts, contributing to perceptually superior novel view synthesis in challenging outdoor scenes.

Abstract

Outdoor scene reconstruction remains challenging due to the stark contrast between well-textured, nearby regions and distant backgrounds dominated by low detail, uneven illumination, and sky effects. We introduce a two-stage Gaussian Splatting framework that explicitly separates and optimizes these regions, yielding higher-fidelity novel view synthesis. In stage one, background primitives are initialized within a spherical shell and optimized using a loss that combines a background-only photometric term with two geometric regularizers: one constraining Gaussians to remain inside the shell, and another aligning them with local tangential planes. In stage two, foreground Gaussians are initialized from a Structure-from-Motion reconstruction, added and refined using the standard rendering loss, while the background set remains fixed but contributes to the final image formation. Experiments on diverse outdoor datasets show that our method reduces background artifacts and improves perceptual quality compared to state-of-the-art baselines. Moreover, the explicit background separation enables automatic, object-free environment map estimation, opening new possibilities for photorealistic outdoor rendering and mixed-reality applications.

Two-Stage Gaussian Splatting Optimization for Outdoor Scene Reconstruction

TL;DR

This work tackles outdoor scene reconstruction with large background regions by introducing a two-stage Gaussian Splatting pipeline that explicitly separates background and foreground using two concentric shells. Stage 1 models the background on an outer spherical shell with and losses and a visibility-based pruning strategy, while Stage 2 adds the foreground within the inner region using the standard GS loss and boundary pruning. The approach yields cleaner background representations, reduces floaters, and enables automatic environment-map generation from the background, with quantitative gains over baselines across five outdoor datasets and practical implications for VR and mixed-reality rendering. The method advances outdoor GS by stabilizing distant-region reconstruction and facilitating photorealistic environment maps, at the cost of extended training time which does not impact runtime rendering performance. and play key roles in preserving stable background geometry and mitigating radial artifacts, contributing to perceptually superior novel view synthesis in challenging outdoor scenes.

Abstract

Outdoor scene reconstruction remains challenging due to the stark contrast between well-textured, nearby regions and distant backgrounds dominated by low detail, uneven illumination, and sky effects. We introduce a two-stage Gaussian Splatting framework that explicitly separates and optimizes these regions, yielding higher-fidelity novel view synthesis. In stage one, background primitives are initialized within a spherical shell and optimized using a loss that combines a background-only photometric term with two geometric regularizers: one constraining Gaussians to remain inside the shell, and another aligning them with local tangential planes. In stage two, foreground Gaussians are initialized from a Structure-from-Motion reconstruction, added and refined using the standard rendering loss, while the background set remains fixed but contributes to the final image formation. Experiments on diverse outdoor datasets show that our method reduces background artifacts and improves perceptual quality compared to state-of-the-art baselines. Moreover, the explicit background separation enables automatic, object-free environment map estimation, opening new possibilities for photorealistic outdoor rendering and mixed-reality applications.

Paper Structure

This paper contains 17 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: All the steps of our two-stage/two-shell Gaussian Splatting pipeline. From left to right: a frame of the input dataset, metric depth map measuring the distance from the center of the scene, masked background image, output of the outer shell's optimization, initialization of the inner shell with photogrammetry, complete reconstruction with outer and inner Gaussian shells, synthetic view.
  • Figure 2: Overview of our two-pass pipeline. From a set of input images, we compute camera poses, a sparse point cloud, and monocular depth maps. Background points are segmented using distance maps and used to initialize a geodesic sphere for the first pass of 3D Gaussian Splatting, producing the background reconstruction with custom losses and pruning. The second pass reconstructs the full scene by combining the background, foreground points, and full images using standard 3D Gaussian Splatting with our pruning strategy.
  • Figure 3: Starting from a collection of photos or a video captured in a limited region (navigation area) of a real environment, we create a Gaussian representation of the light field in the environment with two distinct sets of primitives. The first (background) is constrained to stay within a spherical shell defined by a minimum distance $R_i$ and a maximum distance $R_o$ from the navigation area center. The second (nearby area) represents the objects with a distance from the center lower than $R_i$.
  • Figure 4: Visualization of the background initialization. Points corresponding to infinity (e.g., the sky) are projected onto the outer sphere in a geodesic pattern, while other distant elements are sampled within the spherical shell.
  • Figure 5: Comparison of novel view syntheses for the five benchmark datasets. Arrows highlight regions with relevant artifacts (see text). Our method consistently provides artifact-free results for sky and very distant objects.
  • ...and 3 more figures