Table of Contents
Fetching ...

ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Field

Kiyohiro Nakayama, Mikaela Angelina Uy, Yang You, Ke Li, Leonidas J. Guibas

TL;DR

It is shown that modeling per-point provenance during the NeRF optimization enriches the model with information on triangulation leading to improvements in novel view synthesis and uncertainty estimation under the challenging sparse, unconstrained view setting against competitive baselines.

Abstract

Neural radiance fields (NeRFs) have gained popularity with multiple works showing promising results across various applications. However, to the best of our knowledge, existing works do not explicitly model the distribution of training camera poses, or consequently the triangulation quality, a key factor affecting reconstruction quality dating back to classical vision literature. We close this gap with ProvNeRF, an approach that models the \textbf{provenance} for each point -- i.e., the locations where it is likely visible -- of NeRFs as a stochastic field. We achieve this by extending implicit maximum likelihood estimation (IMLE) to functional space with an optimizable objective. We show that modeling per-point provenance during the NeRF optimization enriches the model with information on triangulation leading to improvements in novel view synthesis and uncertainty estimation under the challenging sparse, unconstrained view setting against competitive baselines.

ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Field

TL;DR

It is shown that modeling per-point provenance during the NeRF optimization enriches the model with information on triangulation leading to improvements in novel view synthesis and uncertainty estimation under the challenging sparse, unconstrained view setting against competitive baselines.

Abstract

Neural radiance fields (NeRFs) have gained popularity with multiple works showing promising results across various applications. However, to the best of our knowledge, existing works do not explicitly model the distribution of training camera poses, or consequently the triangulation quality, a key factor affecting reconstruction quality dating back to classical vision literature. We close this gap with ProvNeRF, an approach that models the \textbf{provenance} for each point -- i.e., the locations where it is likely visible -- of NeRFs as a stochastic field. We achieve this by extending implicit maximum likelihood estimation (IMLE) to functional space with an optimizable objective. We show that modeling per-point provenance during the NeRF optimization enriches the model with information on triangulation leading to improvements in novel view synthesis and uncertainty estimation under the challenging sparse, unconstrained view setting against competitive baselines.
Paper Structure (44 sections, 21 equations, 15 figures, 5 tables)

This paper contains 44 sections, 21 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: (Left) ProvNeRF models a provenance field that outputs provenances for each 3D point as likely samples (arrows). For 3D points (brown triangle and blue circle), the corresponding provenances (illustrated by the arrows), are locations that likely observe them. (Right) ProvNeRF enables better novel view synthesis and estimating the uncertainty of the capturing process because it models the locations of likely observations that is critical for NeRF's optimization.
  • Figure 2: Complex influence of camera baseline distance on the 3D reconstruction.Right: With a wide baseline, the reconstruction is more robust against 2D measurement noises. However, it is more likely to omit hidden surfaces because the invisible region is larger than a small baseline camera pair. Left: With a small baseline, the 3D reconstruction is less likely to suffer from occlusions as the invisible region between cameras is small. However, the reconstruction can be noisy due to large stereo range errors (large deviation in depth with a small amount of noise in the 2D measurement).
  • Figure 3: Training pipeline for ProvNeRF. For each point $\bm{x}$ seen from provenance tuple $(\hat{t}, \hat{\bm{d}})$, with direction $\bm{d}$ at distance $t$, we first sample $K$ latent random functions $\{\bm{Z}_j\}$ from distribution $\mathcal{Z}$. The learned transformation $\bm{H}_\theta$ transforms each $\bm{Z}_j(\bm{x})$ to a provenance sample $\bm{D}_\theta^{(j)}(\bm{x})$. Finally $\bm{H}_{\theta}$ is trained with $\mathcal{L}_{{{\textbf{ProvNeRF}}}\xspace}$ as defined in Eq. \ref{['eq:our_obj']}.
  • Figure 4: Visual Effect of $\mathcal{L}_{\text{ProvNVS}}$ in Eq. \ref{['eq:nvs']}. Compared to pre-trained SCADE model, our method can remove additional floaters in the scene (see the boxed region).
  • Figure 5: Qualitative Results for Uncertainty Modeling. We visualize our uncertainty maps obtained using the method described in Sec. \ref{['sec_uncertainty']}. The uncertainty and depth error maps are shown with color bars specified. Uncertainty values and depth errors are normalized per test image for the result to be comparable. As shown in the boxed regions, our method predicts uncertainty regions with more correlation with the predicted depth errors.
  • ...and 10 more figures