Table of Contents
Fetching ...

Frequency-based View Selection in Gaussian Splatting Reconstruction

Monica M. Q. Li, Pierre-Yves Lajoie, Giovanni Beltrame

TL;DR

This work examines the problem of active view selection to perform 3D Gaussian Splatting reconstructions with as few input images as possible and achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.

Abstract

Three-dimensional reconstruction is a fundamental problem in robotics perception. We examine the problem of active view selection to perform 3D Gaussian Splatting reconstructions with as few input images as possible. Although 3D Gaussian Splatting has made significant progress in image rendering and 3D reconstruction, the quality of the reconstruction is strongly impacted by the selection of 2D images and the estimation of camera poses through Structure-from-Motion (SfM) algorithms. Current methods to select views that rely on uncertainties from occlusions, depth ambiguities, or neural network predictions directly are insufficient to handle the issue and struggle to generalize to new scenes. By ranking the potential views in the frequency domain, we are able to effectively estimate the potential information gain of new viewpoints without ground truth data. By overcoming current constraints on model architecture and efficacy, our method achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.

Frequency-based View Selection in Gaussian Splatting Reconstruction

TL;DR

This work examines the problem of active view selection to perform 3D Gaussian Splatting reconstructions with as few input images as possible and achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.

Abstract

Three-dimensional reconstruction is a fundamental problem in robotics perception. We examine the problem of active view selection to perform 3D Gaussian Splatting reconstructions with as few input images as possible. Although 3D Gaussian Splatting has made significant progress in image rendering and 3D reconstruction, the quality of the reconstruction is strongly impacted by the selection of 2D images and the estimation of camera poses through Structure-from-Motion (SfM) algorithms. Current methods to select views that rely on uncertainties from occlusions, depth ambiguities, or neural network predictions directly are insufficient to handle the issue and struggle to generalize to new scenes. By ranking the potential views in the frequency domain, we are able to effectively estimate the potential information gain of new viewpoints without ground truth data. By overcoming current constraints on model architecture and efficacy, our method achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.
Paper Structure (18 sections, 4 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: The pipeline of the next view selection method: the scene was initialized with a few images. The images, together with their camera poses and a sparse point cloud generated by SfM, were then used to train a 3D-GS model. The trained 3D-GS model was then used to render images from some sampled camera poses not yet visited. Then, the rendered images were transferred to the frequency domain via FFT. The camera pose with the lowest median frequency would be selected as the next view to visit.
  • Figure 2: FFT of rendered images: The blur and artifacts of poorly rendered images were converted into low frequency signals and therefore the view with low frequency signals could be selected as the view to visit next.
  • Figure 3: Trajectories to visit all views and our selected views: our selected views are adjusted 4 units above along z-axis for presentation purpose. From top to bottom are individual scenes: Train, Truck, Dr Johnson and Playroom.
  • Figure 4: From top to bottom are the rendering results of the following settings: 100 views selected by our method with a coarse training, all views in the dataset with a coarse training, 100 views selected by our method with a fine training, all views in the dataset with a fine training, ground truth.