Table of Contents
Fetching ...

Depth-Guided Robust and Fast Point Cloud Fusion NeRF for Sparse Input Views

Shuai Guo, Qiuwen Wang, Yijie Gao, Rong Xie, Li Song

TL;DR

This work proposes a depth-guided robust and fast point cloud fusion NeRF for sparse inputs that can achieve faster reconstruction and greater compactness through effective vector-matrix decomposition.

Abstract

Novel-view synthesis with sparse input views is important for real-world applications like AR/VR and autonomous driving. Recent methods have integrated depth information into NeRFs for sparse input synthesis, leveraging depth prior for geometric and spatial understanding. However, most existing works tend to overlook inaccuracies within depth maps and have low time efficiency. To address these issues, we propose a depth-guided robust and fast point cloud fusion NeRF for sparse inputs. We perceive radiance fields as an explicit voxel grid of features. A point cloud is constructed for each input view, characterized within the voxel grid using matrices and vectors. We accumulate the point cloud of each input view to construct the fused point cloud of the entire scene. Each voxel determines its density and appearance by referring to the point cloud of the entire scene. Through point cloud fusion and voxel grid fine-tuning, inaccuracies in depth values are refined or substituted by those from other views. Moreover, our method can achieve faster reconstruction and greater compactness through effective vector-matrix decomposition. Experimental results underline the superior performance and time efficiency of our approach compared to state-of-the-art baselines.

Depth-Guided Robust and Fast Point Cloud Fusion NeRF for Sparse Input Views

TL;DR

This work proposes a depth-guided robust and fast point cloud fusion NeRF for sparse inputs that can achieve faster reconstruction and greater compactness through effective vector-matrix decomposition.

Abstract

Novel-view synthesis with sparse input views is important for real-world applications like AR/VR and autonomous driving. Recent methods have integrated depth information into NeRFs for sparse input synthesis, leveraging depth prior for geometric and spatial understanding. However, most existing works tend to overlook inaccuracies within depth maps and have low time efficiency. To address these issues, we propose a depth-guided robust and fast point cloud fusion NeRF for sparse inputs. We perceive radiance fields as an explicit voxel grid of features. A point cloud is constructed for each input view, characterized within the voxel grid using matrices and vectors. We accumulate the point cloud of each input view to construct the fused point cloud of the entire scene. Each voxel determines its density and appearance by referring to the point cloud of the entire scene. Through point cloud fusion and voxel grid fine-tuning, inaccuracies in depth values are refined or substituted by those from other views. Moreover, our method can achieve faster reconstruction and greater compactness through effective vector-matrix decomposition. Experimental results underline the superior performance and time efficiency of our approach compared to state-of-the-art baselines.
Paper Structure (30 sections, 15 equations, 7 figures, 2 tables)

This paper contains 30 sections, 15 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Depth-guided sparse input NeRF should overcome the effects of inaccurate depth values. This example illustrates a synthesis result of our method on the LLFF dataset.
  • Figure 2: We compare our method with previous methods in terms of rendering quality (PSNR) and model size. Point sizes correspond to PNSRs. With effective vector-matrix decomposition and point cloud presentation, our work delivers superior rendering quality, faster reconstruction, and greater compactness.
  • Figure 3: Overview of our method. We perceive radiance fields as an explicit voxel grid of features. With RGB-D images and camera parameters of $n$ sparse input views, we first map pixel points into 3D space to construct a point cloud for each view, represented by vectors and matrices. Then we accumulate the point cloud of each input view to construct the fused point cloud of the entire scene. For each shading location $\mathrm{x}_w=(x,y,z)$, we use sampled values from the vectors and matrices to compute the corresponding values of the tensor component. The appearance values are sent to a decoding MLP $S$ for color regression. The loss function is composed of RGB loss and depth loss.
  • Figure 4: Inaccurate depth values can result in inaccurate 3D points. Through point cloud fusion and radiance field optimization, these inaccurate 3D points are substituted with accurate ones from other views. The squares represented by the dotted edges indicate inaccurate 3D points.
  • Figure 5: Qualitative comparisons on the LLFF dataset with two input views. Notably, the predictions from DSNeRF and DDP-NeRF exhibit noticeable floater artifacts. RegNeRF struggles to capture the finer details in bone structures. In contrast, our method significantly reduces these imperfections. In the second and fourth examples, we highlight the color changes predicted by DDP-NeRF. Our model's predictions are free from the aforementioned artifacts.
  • ...and 2 more figures