Enhancing Neural Radiance Fields with Depth and Normal Completion Priors from Sparse Views
Jiawei Guo, HungChyun Chou, Ning Ding
TL;DR
CP_NeRF tackles the challenge of rendering accurate views with limited photographs by introducing depth and normal dense completion priors derived from sparse SfM-derived maps. The method learns dense depth and normal maps with uncertainties using two priors networks and integrates them into a three-branch NeRF with per-image embeddings and an optical center embedder, along with normal patch matching for supervision. Empirical results on ScanNet show consistent improvements in novel-view synthesis and geometry over baselines, while mitigating artifacts such as floaters in sparse data settings. The approach offers a practical path to high-quality indoor scene rendering from few views by leveraging uncertainty-aware priors to guide sampling and supervision.
Abstract
Neural Radiance Fields (NeRF) are an advanced technology that creates highly realistic images by learning about scenes through a neural network model. However, NeRF often encounters issues when there are not enough images to work with, leading to problems in accurately rendering views. The main issue is that NeRF lacks sufficient structural details to guide the rendering process accurately. To address this, we proposed a Depth and Normal Dense Completion Priors for NeRF (CP\_NeRF) framework. This framework enhances view rendering by adding depth and normal dense completion priors to the NeRF optimization process. Before optimizing NeRF, we obtain sparse depth maps using the Structure from Motion (SfM) technique used to get camera poses. Based on the sparse depth maps and a normal estimator, we generate sparse normal maps for training a normal completion prior with precise standard deviations. During optimization, we apply depth and normal completion priors to transform sparse data into dense depth and normal maps with their standard deviations. We use these dense maps to guide ray sampling, assist distance sampling and construct a normal loss function for better training accuracy. To improve the rendering of NeRF's normal outputs, we incorporate an optical centre position embedder that helps synthesize more accurate normals through volume rendering. Additionally, we employ a normal patch matching technique to choose accurate rendered normal maps, ensuring more precise supervision for the model. Our method is superior to leading techniques in rendering detailed indoor scenes, even with limited input views.
