ActiveInitSplat: How Active Image Selection Helps Gaussian Splatting
Konstantinos D. Polyzos, Athanasios Bacharis, Saketh Madhuvarasu, Nikos Papanikolopoulos, Tara Javidi
TL;DR
ActiveInitSplat introduces an active camera view selection framework for Gaussian splatting (GS) that optimizes a black-box 3D representation quality objective based on point-cloud density and voxel occupancy. Using a Gaussian-process surrogate, it selects diverse viewpoints to improve GS initialization and rendering in both dense- and sparse-view regimes, without requiring depth or scene priors. The approach, validated on benchmark datasets and a real-world drone platform, shows consistent improvements in LPIPS, SSIM, and PSNR over passive view strategies and demonstrates architecture-agnostic compatibility with GS variants. This work has practical impact for efficient, high-quality real-time 3D scene rendering with reduced image acquisition demands.
Abstract
Gaussian splatting (GS) along with its extensions and variants provides outstanding performance in real-time scene rendering while meeting reduced storage demands and computational efficiency. While the selection of 2D images capturing the scene of interest is crucial for the proper initialization and training of GS, hence markedly affecting the rendering performance, prior works rely on passively and typically densely selected 2D images. In contrast, this paper proposes `ActiveInitSplat', a novel framework for active selection of training images for proper initialization and training of GS. ActiveInitSplat relies on density and occupancy criteria of the resultant 3D scene representation from the selected 2D images, to ensure that the latter are captured from diverse viewpoints leading to better scene coverage and that the initialized Gaussian functions are well aligned with the actual 3D structure. Numerical tests on well-known simulated and real environments demonstrate the merits of ActiveInitSplat resulting in significant GS rendering performance improvement over passive GS baselines in both dense- and sparse-view settings, in the widely adopted LPIPS, SSIM, and PSNR metrics.
