Table of Contents
Fetching ...

NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction

Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou, Baorui Ma, Kanle Shi, Yu-Shen Liu, Zhizhong Han

TL;DR

This work addresses indoor scene reconstruction from multi-view RGB images by overcoming limitations of data-driven priors through a per-scene NeRF prior. It trains a grid-based NeRF quickly to provide geometry and color cues, and uses volume rendering to learn a signed distance function with a multi-view consistency constraint and a depth consistency loss for textureless regions. The approach yields state-of-the-art results on multiple indoor benchmarks while reducing training time by leveraging fast NeRF priors and avoiding additional data. Overall, NeRFPrior offers a data-efficient, high-fidelity pathway for implicit surface reconstruction in indoor environments.

Abstract

Recently, it has shown that priors are vital for neural implicit functions to reconstruct high-quality surfaces from multi-view RGB images. However, current priors require large-scale pre-training, and merely provide geometric clues without considering the importance of color. In this paper, we present NeRFPrior, which adopts a neural radiance field as a prior to learn signed distance fields using volume rendering for surface reconstruction. Our NeRF prior can provide both geometric and color clues, and also get trained fast under the same scene without additional data. Based on the NeRF prior, we are enabled to learn a signed distance function (SDF) by explicitly imposing a multi-view consistency constraint on each ray intersection for surface inference. Specifically, at each ray intersection, we use the density in the prior as a coarse geometry estimation, while using the color near the surface as a clue to check its visibility from another view angle. For the textureless areas where the multi-view consistency constraint does not work well, we further introduce a depth consistency loss with confidence weights to infer the SDF. Our experimental results outperform the state-of-the-art methods under the widely used benchmarks.

NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction

TL;DR

This work addresses indoor scene reconstruction from multi-view RGB images by overcoming limitations of data-driven priors through a per-scene NeRF prior. It trains a grid-based NeRF quickly to provide geometry and color cues, and uses volume rendering to learn a signed distance function with a multi-view consistency constraint and a depth consistency loss for textureless regions. The approach yields state-of-the-art results on multiple indoor benchmarks while reducing training time by leveraging fast NeRF priors and avoiding additional data. Overall, NeRFPrior offers a data-efficient, high-fidelity pathway for implicit surface reconstruction in indoor environments.

Abstract

Recently, it has shown that priors are vital for neural implicit functions to reconstruct high-quality surfaces from multi-view RGB images. However, current priors require large-scale pre-training, and merely provide geometric clues without considering the importance of color. In this paper, we present NeRFPrior, which adopts a neural radiance field as a prior to learn signed distance fields using volume rendering for surface reconstruction. Our NeRF prior can provide both geometric and color clues, and also get trained fast under the same scene without additional data. Based on the NeRF prior, we are enabled to learn a signed distance function (SDF) by explicitly imposing a multi-view consistency constraint on each ray intersection for surface inference. Specifically, at each ray intersection, we use the density in the prior as a coarse geometry estimation, while using the color near the surface as a clue to check its visibility from another view angle. For the textureless areas where the multi-view consistency constraint does not work well, we further introduce a depth consistency loss with confidence weights to infer the SDF. Our experimental results outperform the state-of-the-art methods under the widely used benchmarks.

Paper Structure

This paper contains 15 sections, 11 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: An overview of our NeRFPrior method. Given multi-view images of a scene as input, we first train a grid-based NeRF to obtain the density field and color field as priors. We then learn a signed distance function by imposing a multi-view consistency constraint using volume rendering. For each sampled point on the ray, we query the prior density and prior color as additional supervision of the predicted density and color, respectively. To improve the smoothness and completeness of textureless areas in the scene, we propose a depth consistency loss, which forces surface points in the same textureless plane to have similar depths.
  • Figure 2: Comparison on object-surrounding scenes between MonoSDF and ours. The performance of MonoSDF drastically degenerates because the depth prior cannot generalize well to different kinds of datasets.
  • Figure 3: An illustration of our multi-view consistency constraint. To judge the visibility of the intersection, we conduct a local-prior volume rendering around the intersection and compare the rendering color with the projection color. The ray from source view is participated in training only if the intersection is visible along this ray.
  • Figure 4: A comparison on the accuracy of visibility check. The first row shows the ground truth result of projecting pixels from reference view to source view. The second row shows the visibility mask, indicating which points in the reference view are visible after projection. The third row is the error map of visibility check.
  • Figure 5: An illustration of our depth consistency loss. We calculate the density variance of the intersection and its neighboring points on the tangent plane. If (a) the variance is small, we constrain these points to maintain the same depth on normal directions as in (c). Otherwise, (b) we do not impose depth constraints.
  • ...and 5 more figures