Table of Contents
Fetching ...

Finding Waldo: Towards Efficient Exploration of NeRF Scene Spaces

Evangelos Skartados, Mehmet Kerim Yucel, Bruno Manganelli, Anastasios Drosou, Albert Saà-Garriga

TL;DR

This work defines and formalizes the scene exploration problem for NeRF-based scene representations, aiming to discover camera poses that render views satisfying user-defined criteria. It introduces two naive baselines, Guided Random Search and Pose Interpolation-based Search, and proposes Evolution-Guided Pose Search (EGPS) as a task-agnostic optimization method leveraging a genetic-algorithm framework to balance accuracy and exploration. The framework is evaluated on real-world scenes across criteria such as photo-composition, saliency, and image quality, demonstrating that EGPS generally outperforms baselines in generating diverse, high-quality novel views. By enabling efficient NeRF-scene space exploration, the approach has practical implications for content creation, multimedia production, and VR/AR applications, and points to future work on robust criteria, multi-criteria optimization, and temporal pose trajectories.

Abstract

Neural Radiance Fields (NeRF) have quickly become the primary approach for 3D reconstruction and novel view synthesis in recent years due to their remarkable performance. Despite the huge interest in NeRF methods, a practical use case of NeRFs has largely been ignored; the exploration of the scene space modelled by a NeRF. In this paper, for the first time in the literature, we propose and formally define the scene exploration framework as the efficient discovery of NeRF model inputs (i.e. coordinates and viewing angles), using which one can render novel views that adhere to user-selected criteria. To remedy the lack of approaches addressing scene exploration, we first propose two baseline methods called Guided-Random Search (GRS) and Pose Interpolation-based Search (PIBS). We then cast scene exploration as an optimization problem, and propose the criteria-agnostic Evolution-Guided Pose Search (EGPS) for efficient exploration. We test all three approaches with various criteria (e.g. saliency maximization, image quality maximization, photo-composition quality improvement) and show that our EGPS performs more favourably than other baselines. We finally highlight key points and limitations, and outline directions for future research in scene exploration.

Finding Waldo: Towards Efficient Exploration of NeRF Scene Spaces

TL;DR

This work defines and formalizes the scene exploration problem for NeRF-based scene representations, aiming to discover camera poses that render views satisfying user-defined criteria. It introduces two naive baselines, Guided Random Search and Pose Interpolation-based Search, and proposes Evolution-Guided Pose Search (EGPS) as a task-agnostic optimization method leveraging a genetic-algorithm framework to balance accuracy and exploration. The framework is evaluated on real-world scenes across criteria such as photo-composition, saliency, and image quality, demonstrating that EGPS generally outperforms baselines in generating diverse, high-quality novel views. By enabling efficient NeRF-scene space exploration, the approach has practical implications for content creation, multimedia production, and VR/AR applications, and points to future work on robust criteria, multi-criteria optimization, and temporal pose trajectories.

Abstract

Neural Radiance Fields (NeRF) have quickly become the primary approach for 3D reconstruction and novel view synthesis in recent years due to their remarkable performance. Despite the huge interest in NeRF methods, a practical use case of NeRFs has largely been ignored; the exploration of the scene space modelled by a NeRF. In this paper, for the first time in the literature, we propose and formally define the scene exploration framework as the efficient discovery of NeRF model inputs (i.e. coordinates and viewing angles), using which one can render novel views that adhere to user-selected criteria. To remedy the lack of approaches addressing scene exploration, we first propose two baseline methods called Guided-Random Search (GRS) and Pose Interpolation-based Search (PIBS). We then cast scene exploration as an optimization problem, and propose the criteria-agnostic Evolution-Guided Pose Search (EGPS) for efficient exploration. We test all three approaches with various criteria (e.g. saliency maximization, image quality maximization, photo-composition quality improvement) and show that our EGPS performs more favourably than other baselines. We finally highlight key points and limitations, and outline directions for future research in scene exploration.
Paper Structure (15 sections, 8 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 8 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Qualitative examples for each search method for the scenes fox and family for photo-composition improvement. For each scene, top row shows the best training images, second row shows the best results of GRS, third row shows the best results of PIBS and the last row shows the results of EGPS. Overlaid on images are scores generated with zhang2021image.
  • Figure 2: Qualitative examples for each search method for the scenes fox and family for image quality maximization. For each scene, top row shows the best training images, second row shows the best results of GRS, third row shows the best results of PIBS and the last row shows the results of EGPS. Overlaid on images are their scores generated with madhusudana2022image.
  • Figure 3: Qualitative examples for each search method for the scenes horns and family for saliency maximization. For each scene, top row shows the best training images, second row shows the best results of GRS, third row shows the best results of PIBS and the last row shows the results of EGPS. Overlaid on images are the number of pixels predicted to be salient via qin2019basnet.
  • Figure 4: Visualization of the camera poses of the family scene, where the scene centre is shown with the axes lines. Each point corresponds to an image pose (training or novel), where green points are above an IQA score threshold and red ones are not. Left-most image shows the poses of available images in the beginning, whereas middle column are the results generated in the first epoch, and the right-most column are the poses generated after 5 epochs. From top to bottom, GRS, PIBS and EGPS results are shown. Better viewed when zoomed in.