Table of Contents
Fetching ...

ActiveInitSplat: How Active Image Selection Helps Gaussian Splatting

Konstantinos D. Polyzos, Athanasios Bacharis, Saketh Madhuvarasu, Nikos Papanikolopoulos, Tara Javidi

TL;DR

ActiveInitSplat introduces an active camera view selection framework for Gaussian splatting (GS) that optimizes a black-box 3D representation quality objective based on point-cloud density and voxel occupancy. Using a Gaussian-process surrogate, it selects diverse viewpoints to improve GS initialization and rendering in both dense- and sparse-view regimes, without requiring depth or scene priors. The approach, validated on benchmark datasets and a real-world drone platform, shows consistent improvements in LPIPS, SSIM, and PSNR over passive view strategies and demonstrates architecture-agnostic compatibility with GS variants. This work has practical impact for efficient, high-quality real-time 3D scene rendering with reduced image acquisition demands.

Abstract

Gaussian splatting (GS) along with its extensions and variants provides outstanding performance in real-time scene rendering while meeting reduced storage demands and computational efficiency. While the selection of 2D images capturing the scene of interest is crucial for the proper initialization and training of GS, hence markedly affecting the rendering performance, prior works rely on passively and typically densely selected 2D images. In contrast, this paper proposes `ActiveInitSplat', a novel framework for active selection of training images for proper initialization and training of GS. ActiveInitSplat relies on density and occupancy criteria of the resultant 3D scene representation from the selected 2D images, to ensure that the latter are captured from diverse viewpoints leading to better scene coverage and that the initialized Gaussian functions are well aligned with the actual 3D structure. Numerical tests on well-known simulated and real environments demonstrate the merits of ActiveInitSplat resulting in significant GS rendering performance improvement over passive GS baselines in both dense- and sparse-view settings, in the widely adopted LPIPS, SSIM, and PSNR metrics.

ActiveInitSplat: How Active Image Selection Helps Gaussian Splatting

TL;DR

ActiveInitSplat introduces an active camera view selection framework for Gaussian splatting (GS) that optimizes a black-box 3D representation quality objective based on point-cloud density and voxel occupancy. Using a Gaussian-process surrogate, it selects diverse viewpoints to improve GS initialization and rendering in both dense- and sparse-view regimes, without requiring depth or scene priors. The approach, validated on benchmark datasets and a real-world drone platform, shows consistent improvements in LPIPS, SSIM, and PSNR over passive view strategies and demonstrates architecture-agnostic compatibility with GS variants. This work has practical impact for efficient, high-quality real-time 3D scene rendering with reduced image acquisition demands.

Abstract

Gaussian splatting (GS) along with its extensions and variants provides outstanding performance in real-time scene rendering while meeting reduced storage demands and computational efficiency. While the selection of 2D images capturing the scene of interest is crucial for the proper initialization and training of GS, hence markedly affecting the rendering performance, prior works rely on passively and typically densely selected 2D images. In contrast, this paper proposes `ActiveInitSplat', a novel framework for active selection of training images for proper initialization and training of GS. ActiveInitSplat relies on density and occupancy criteria of the resultant 3D scene representation from the selected 2D images, to ensure that the latter are captured from diverse viewpoints leading to better scene coverage and that the initialized Gaussian functions are well aligned with the actual 3D structure. Numerical tests on well-known simulated and real environments demonstrate the merits of ActiveInitSplat resulting in significant GS rendering performance improvement over passive GS baselines in both dense- and sparse-view settings, in the widely adopted LPIPS, SSIM, and PSNR metrics.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: ActiveInitSplat in a nutshell. In contrast to existing Gaussian splatting (GS) methods that rely on passively (and possibly densely) collected 2D images of the scene of interest (upper part), ActiveInitSplat actively selects informative images to assist the initialization and training of GS (lower part). The active selection mechanism lies on optimizing a (black-box) function quantifying the quality of the resultant 3D point cloud from the selected images via density and occupancy criteria. The collected images are captured from diverse viewpoints, ensuring better scene coverage and facilitating the accurate alignment of the initialized Gaussian functions with the underlying 3D structure.
  • Figure 2: Active viewpoint selection process using Gaussian process-based surrogate modeling for black-box optimization.
  • Figure 3: Visual comparison of ActiveInitSplat with the passive selection counterparts in real-world datasets. For each dataset, we illustrate a single indicative test image where the differences are shown with the annotated yellow dashed boxes. When using ActiveInitSplat, the rendered images are closer to the ground-truth ones.
  • Figure 4: The value of (a) $r_q$ function, (b) LPIPS, (c) SSIM, and (d) PSNR at each iteration of the selection process for all competing methods in the simulated 'office 4' environment replica19arxiv where any viewpoint in the 3D space can be selected at each iteration. After 10 iterations, ActiveInitSplat results in higher quality 3D scene representation compared to baselines, along with superior GS rendering performance. In addition, ActiveInitSplat converges faster to the desired rendering performance.
  • Figure 5: Visual comparison of ActiveInitSplat with the passive selection counterparts in the real-world drone-test platform on three indicative test (novel) viewpoints. These test views are captured from diverse angles and include not only the building within the sports field but also the broader surroundings, including background buildings, trees, and roads, which are particularly challenging to capture in such a large-scale environment. It is evident that the rendered images from ActiveInitSplat have substantially improved quality compared to the passive counterpart.