Table of Contents
Fetching ...

TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

Zhen Tan, Zongtan Zhou, Yangbing Ge, Zi Wang, Xieyuanli Chen, Dewen Hu

TL;DR

Tuncated Depth NeRF is proposed, a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses by utilizing monocular depth priors.

Abstract

The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.

TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

TL;DR

Tuncated Depth NeRF is proposed, a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses by utilizing monocular depth priors.

Abstract

The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.
Paper Structure (23 sections, 11 equations, 6 figures, 6 tables)

This paper contains 23 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison with the state-of-the-art depth-based NeRF method NoPe-NeRF bian2023nope. RGB images and depth images are rendered by NeRF with a coarse depth map.
  • Figure 2: Overview. The inputs are RGB images without poses, and RGB images are first processed by a pre-trained depth network to obtain depth priors. Then, we employ a truncated normal distribution to optimize the ray sampling of each pixel based on the depth priors with a coarse-to-fine training strategy (①: coarse step, ②: fine step). Subsequently, the sampled points are fed into an MLP to estimate the color $c$ and the density $\sigma$. Next, RGB and depth images are integrated by color $c$ and $\sigma$ by utilizing volume rendering. Finally, the radiance field is optimized by supervising depth and RGB. Additionally, we incorporate depth information to calculate GPC and reprojection loss between point clouds, providing constraints for inter-frame pose optimization and refinement.
  • Figure 3: Qualitative Comparison of Novel View Synthesis on Tanks & Temples (top: 4 rows) and LLFF (bottom: last 2 rows) dataset. The rendered RGB and depth images are visualized above. TD-NeRF is able to recover better details for both RGB and depth geometry, as shown in the red box. (The ground truth of depth is generated by DPT ranftl2021dpt.)
  • Figure 4: Pose Estimation Comparison. We visualize the camera poses on LLFF (scene: fern). red: ground truth; blue: predicted pose
  • Figure 5: Visualization of convergence. The experiment is conducted on the dataset BLEFF (scene: bed1). The blue and green colors denote NoPe-NeRF and ours, respectively. At the red arrow, the error of our method already reaches the final convergence result of Nope-NeRF.
  • ...and 1 more figures