Table of Contents
Fetching ...

Segmentation-Driven Initialization for Sparse-view 3D Gaussian Splatting

Yi-Hsin Li, Thomas Sikora, Sebastian Knorr, Mårten Sjöström

TL;DR

This work tackles sparse-view 3D scene reconstruction with 3D Gaussian Splatting by addressing initialization inefficiency. It introduces Segmentation-Driven Initialization for Gaussian Splatting (SDI-GS), which uses region-based 2D segmentation to downsample a dense lifted point cloud and initialize a compact, structure-aware set of Gaussians. The method achieves comparable or superior rendering quality to SfM-based and SfM-free baselines while reducing Gaussian counts and memory by up to 50% and 75% respectively, and preserving fast training. By decoupling initialization from dense pixel lifting and enforcing cross-view structural consistency, SDI-GS improves practicality for constrained-view scenarios and real-time applications. Future work could explore adaptive segmentation and handling of dynamic or challenging lighting conditions.

Abstract

Sparse-view synthesis remains a challenging problem due to the difficulty of recovering accurate geometry and appearance from limited observations. While recent advances in 3D Gaussian Splatting (3DGS) have enabled real-time rendering with competitive quality, existing pipelines often rely on Structure-from-Motion (SfM) for camera pose estimation, an approach that struggles in genuinely sparse-view settings. Moreover, several SfM-free methods replace SfM with multi-view stereo (MVS) models, but generate massive numbers of 3D Gaussians by back-projecting every pixel into 3D space, leading to high memory costs. We propose Segmentation-Driven Initialization for Gaussian Splatting (SDI-GS), a method that mitigates inefficiency by leveraging region-based segmentation to identify and retain only structurally significant regions. This enables selective downsampling of the dense point cloud, preserving scene fidelity while substantially reducing Gaussian count. Experiments across diverse benchmarks show that SDI-GS reduces Gaussian count by up to 50% and achieves comparable or superior rendering quality in PSNR and SSIM, with only marginal degradation in LPIPS. It further enables faster training and lower memory footprint, advancing the practicality of 3DGS for constrained-view scenarios.

Segmentation-Driven Initialization for Sparse-view 3D Gaussian Splatting

TL;DR

This work tackles sparse-view 3D scene reconstruction with 3D Gaussian Splatting by addressing initialization inefficiency. It introduces Segmentation-Driven Initialization for Gaussian Splatting (SDI-GS), which uses region-based 2D segmentation to downsample a dense lifted point cloud and initialize a compact, structure-aware set of Gaussians. The method achieves comparable or superior rendering quality to SfM-based and SfM-free baselines while reducing Gaussian counts and memory by up to 50% and 75% respectively, and preserving fast training. By decoupling initialization from dense pixel lifting and enforcing cross-view structural consistency, SDI-GS improves practicality for constrained-view scenarios and real-time applications. Future work could explore adaptive segmentation and handling of dynamic or challenging lighting conditions.

Abstract

Sparse-view synthesis remains a challenging problem due to the difficulty of recovering accurate geometry and appearance from limited observations. While recent advances in 3D Gaussian Splatting (3DGS) have enabled real-time rendering with competitive quality, existing pipelines often rely on Structure-from-Motion (SfM) for camera pose estimation, an approach that struggles in genuinely sparse-view settings. Moreover, several SfM-free methods replace SfM with multi-view stereo (MVS) models, but generate massive numbers of 3D Gaussians by back-projecting every pixel into 3D space, leading to high memory costs. We propose Segmentation-Driven Initialization for Gaussian Splatting (SDI-GS), a method that mitigates inefficiency by leveraging region-based segmentation to identify and retain only structurally significant regions. This enables selective downsampling of the dense point cloud, preserving scene fidelity while substantially reducing Gaussian count. Experiments across diverse benchmarks show that SDI-GS reduces Gaussian count by up to 50% and achieves comparable or superior rendering quality in PSNR and SSIM, with only marginal degradation in LPIPS. It further enables faster training and lower memory footprint, advancing the practicality of 3DGS for constrained-view scenarios.

Paper Structure

This paper contains 25 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of our segmentation-driven initialization pipeline for sparse-view 3D Gaussian Splatting. Given sparse input views, we estimate camera poses and lift all image pixels into a dense 3D point cloud. We apply region-based segmentation on each image and propagate these segmentations across views to construct segment-aware 3D labels. These labels guide a structured downsampling process that prunes redundant points while preserving geometric structure. The resulting filtered points initialize 3D Gaussians, which are jointly optimized with camera poses to produce the final radiance field.
  • Figure 2: Visualization of segmentation-guided downsampling. (a) Input RGB image; (b) region-based segmentation map; (c) retained pixel mask after stratified sampling; (d) final downsampled 3D points projected onto the image plane. Redundant points in flat areas (e.g., sky) are removed, while structural details are preserved by retaining more points in high-frequency regions.
  • Figure 3: Qualitative comparison across three datasets under 3-view (top subrow) and 12-view (bottom subrow) settings. Each row corresponds to a different dataset. Within each dataset, we show rendering results for CF-3DGS, InstantSplat, and our method. CF-3DGS exhibits severe artifacts due to unreliable pose estimation. In contrast, both InstantSplat and our method use MASt3R for initialization and refine poses during training, leading to stable and accurate reconstructions. Our segmentation-driven downsampling further reduces memory usage without compromising visual quality.
  • Figure 4: Compression-performance trend across increasing view counts (3, 6, 12), averaged over Tanks and Temples, MVImgNet, and Mip-NeRF 360. Each line shows PSNR versus file size for a method. As the number of input views increases, our method achieves greater compression gains—reducing file size by up to 75% at 12 views—while maintaining comparable reconstruction quality.
  • Figure 5: Qualitative comparison with SfM-based methods (FSGS, SparseGS, CoR-GS) and SfM-free InstantSplat. SDI-GS matches the visual quality of SfM-based pipelines while keeping the lightweight memory footprint of SfM-free methods (Table \ref{['tab:4']}) and drastically reducing training time. Unlike InstantSplat, which demands more storage, our segmentation-driven initialization provides a compact, efficient representation without compromising structural fidelity.