Table of Contents
Fetching ...

VF-NeRF: Viewshed Fields for Rigid NeRF Registration

Leo Segre, Shai Avidan

TL;DR

VF-NeRF presents Viewshed Fields (VF), an implicit field trained with Normalizing Flows to identify 3D points that were well-covered by the original camera set in NeRF scenes. By sampling high-VF oriented points, the method generates informative novel views and constructs VF-guided initializations and rays for robust 6-DoF registration between NeRFs without known camera poses. The approach achieves state-of-the-art results across LLFF, casually captured scenes, and Objaverse datasets, highlighting strong initialization, robust optimization, and resilience to illumination changes and noise. This VF+NF framework offers a scalable, modular pathway for NeRF-to-NeRF alignment and provides versatile sampling for both novel views and sparse point clouds.

Abstract

3D scene registration is a fundamental problem in computer vision that seeks the best 6-DoF alignment between two scenes. This problem was extensively investigated in the case of point clouds and meshes, but there has been relatively limited work regarding Neural Radiance Fields (NeRF). In this paper, we consider the problem of rigid registration between two NeRFs when the position of the original cameras is not given. Our key novelty is the introduction of Viewshed Fields (VF), an implicit function that determines, for each 3D point, how likely it is to be viewed by the original cameras. We demonstrate how VF can help in the various stages of NeRF registration, with an extensive evaluation showing that VF-NeRF achieves SOTA results on various datasets with different capturing approaches such as LLFF and Objaverese.

VF-NeRF: Viewshed Fields for Rigid NeRF Registration

TL;DR

VF-NeRF presents Viewshed Fields (VF), an implicit field trained with Normalizing Flows to identify 3D points that were well-covered by the original camera set in NeRF scenes. By sampling high-VF oriented points, the method generates informative novel views and constructs VF-guided initializations and rays for robust 6-DoF registration between NeRFs without known camera poses. The approach achieves state-of-the-art results across LLFF, casually captured scenes, and Objaverse datasets, highlighting strong initialization, robust optimization, and resilience to illumination changes and noise. This VF+NF framework offers a scalable, modular pathway for NeRF-to-NeRF alignment and provides versatile sampling for both novel views and sparse point clouds.

Abstract

3D scene registration is a fundamental problem in computer vision that seeks the best 6-DoF alignment between two scenes. This problem was extensively investigated in the case of point clouds and meshes, but there has been relatively limited work regarding Neural Radiance Fields (NeRF). In this paper, we consider the problem of rigid registration between two NeRFs when the position of the original cameras is not given. Our key novelty is the introduction of Viewshed Fields (VF), an implicit function that determines, for each 3D point, how likely it is to be viewed by the original cameras. We demonstrate how VF can help in the various stages of NeRF registration, with an extensive evaluation showing that VF-NeRF achieves SOTA results on various datasets with different capturing approaches such as LLFF and Objaverese.
Paper Structure (34 sections, 7 equations, 12 figures, 7 tables)

This paper contains 34 sections, 7 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 2: Novel View Generation: Randomly sampling novel camera parameters often lead to non-informative images. For example, the red-marked image on the left was generated using a camera that lies on the unit sphere looking towards the origin. In contrast, using our novel Viewshed Fields (VF) representation we are able to generate informative camera positions (green marked images on the right) that can then be used to register two NeRFs.
  • Figure 3: VF-NeRF: (Left) Our VF-NeRF consists of two parts. The first is a NeRF network with the standard RGB and $\sigma$ outputs with depth estimation. The second part is a simple normalizing-flows network, where its input is a point on the surface and the camera direction (i.e., an oriented point) and its output is the log-likelihood estimation that is maximized during the training phase. (Right) To generate novel views we sample from the 6-dimensional Gaussian in the Normalizing-Flows latent space. Then we recover the oriented point $(\Tilde{x}, \Vec{d})$ and use equation \ref{['eq:origin_reconsruction']} to reconstruct the camera origin. Finally, we render the view of the camera in position $\vec{o}$ and direction $\vec{d}$.
  • Figure 4: Viewshed Fields: (\ref{['fig:VF']} Left) During NeRF training, we sample oriented points (blue) around surfaces in the scene and use Normalizing Flows (NF) to map them to a Gaussian in latent space. (\ref{['fig:VF']} Right) During NeRF registration, we sample a high visibility oriented point (green) from the Gaussian and map it to the input space where it is used to determine the position of the novel camera. \ref{['fig:trex_vf_example_rgb']} demonstrates a novel view synthesis generated using our method and \ref{['fig:trex_vf_example']} is the viewshed map generated respectively.
  • Figure 5: VF pixel sampling: (Left) VF map of a novel view from the Fern scene from LLFF dataset (Middle) Green pixels sampled using VF map (right) Red pixels sampled randomly. The VF mask guides the process to sample pixels with more reliable RGB value.
  • Figure 6: Point clouds from VF: Point clouds generated by sampling from the VF distribution as explained in sub section \ref{['initialization']}. Each point cloud here is a combination of two point clouds from two NeRFs after applying our registration method. The examples are taken from all the datasets we evaluate in the paper, \ref{['fig:pc_phone']} from Objaverse dataset, \ref{['fig:pc_horns']} from LLFF dataset and \ref{['fig:pc_table']}-\ref{['fig:pc_lion']} from our casually captured dataset.
  • ...and 7 more figures