Segmentation-Driven Initialization for Sparse-view 3D Gaussian Splatting
Yi-Hsin Li, Thomas Sikora, Sebastian Knorr, Mårten Sjöström
TL;DR
This work tackles sparse-view 3D scene reconstruction with 3D Gaussian Splatting by addressing initialization inefficiency. It introduces Segmentation-Driven Initialization for Gaussian Splatting (SDI-GS), which uses region-based 2D segmentation to downsample a dense lifted point cloud and initialize a compact, structure-aware set of Gaussians. The method achieves comparable or superior rendering quality to SfM-based and SfM-free baselines while reducing Gaussian counts and memory by up to 50% and 75% respectively, and preserving fast training. By decoupling initialization from dense pixel lifting and enforcing cross-view structural consistency, SDI-GS improves practicality for constrained-view scenarios and real-time applications. Future work could explore adaptive segmentation and handling of dynamic or challenging lighting conditions.
Abstract
Sparse-view synthesis remains a challenging problem due to the difficulty of recovering accurate geometry and appearance from limited observations. While recent advances in 3D Gaussian Splatting (3DGS) have enabled real-time rendering with competitive quality, existing pipelines often rely on Structure-from-Motion (SfM) for camera pose estimation, an approach that struggles in genuinely sparse-view settings. Moreover, several SfM-free methods replace SfM with multi-view stereo (MVS) models, but generate massive numbers of 3D Gaussians by back-projecting every pixel into 3D space, leading to high memory costs. We propose Segmentation-Driven Initialization for Gaussian Splatting (SDI-GS), a method that mitigates inefficiency by leveraging region-based segmentation to identify and retain only structurally significant regions. This enables selective downsampling of the dense point cloud, preserving scene fidelity while substantially reducing Gaussian count. Experiments across diverse benchmarks show that SDI-GS reduces Gaussian count by up to 50% and achieves comparable or superior rendering quality in PSNR and SSIM, with only marginal degradation in LPIPS. It further enables faster training and lower memory footprint, advancing the practicality of 3DGS for constrained-view scenarios.
