Table of Contents
Fetching ...

CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory

Yunlong Ran, Yanxu Li, Qi Ye, Yuchi Huo, Zechun Bai, Jiahao Sun, Jiming Chen

TL;DR

CT-NeRF tackles the challenge of reconstructing neural radiance fields from RGB sequences with unknown poses, especially along complex trajectories, by coupling an incremental NeRF optimization with a pose-graph framework and a reprojection-based geometric image distance derived from dense correspondences. The method integrates tracking, windowed local refinement, and global bundle adjustment, augmented by a differentiable reprojection loss that guides both pose and scene geometry optimization. Key contributions include a local-global BA structure with in-between pose edges and a reprojection constraint learned from correspondences, yielding improved pose accuracy and novel-view synthesis on challenging datasets. The approach demonstrates robust performance without depth input and without relying on high-quality initial poses, advancing NeRF-based reconstruction in real-world, complex-motion scenarios.

Abstract

Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address this limitation, we propose CT-NeRF, an incremental reconstruction optimization pipeline using only RGB images without pose and depth input. In this pipeline, we first propose a local-global bundle adjustment under a pose graph connecting neighboring frames to enforce the consistency between poses to escape the local minima caused by only pose consistency with the scene structure. Further, we instantiate the consistency between poses as a reprojected geometric image distance constraint resulting from pixel-level correspondences between input image pairs. Through the incremental reconstruction, CT-NeRF enables the recovery of both camera poses and scene structure and is capable of handling scenes with complex trajectories. We evaluate the performance of CT-NeRF on two real-world datasets, NeRFBuster and Free-Dataset, which feature complex trajectories. Results show CT-NeRF outperforms existing methods in novel view synthesis and pose estimation accuracy.

CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory

TL;DR

CT-NeRF tackles the challenge of reconstructing neural radiance fields from RGB sequences with unknown poses, especially along complex trajectories, by coupling an incremental NeRF optimization with a pose-graph framework and a reprojection-based geometric image distance derived from dense correspondences. The method integrates tracking, windowed local refinement, and global bundle adjustment, augmented by a differentiable reprojection loss that guides both pose and scene geometry optimization. Key contributions include a local-global BA structure with in-between pose edges and a reprojection constraint learned from correspondences, yielding improved pose accuracy and novel-view synthesis on challenging datasets. The approach demonstrates robust performance without depth input and without relying on high-quality initial poses, advancing NeRF-based reconstruction in real-world, complex-motion scenarios.

Abstract

Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address this limitation, we propose CT-NeRF, an incremental reconstruction optimization pipeline using only RGB images without pose and depth input. In this pipeline, we first propose a local-global bundle adjustment under a pose graph connecting neighboring frames to enforce the consistency between poses to escape the local minima caused by only pose consistency with the scene structure. Further, we instantiate the consistency between poses as a reprojected geometric image distance constraint resulting from pixel-level correspondences between input image pairs. Through the incremental reconstruction, CT-NeRF enables the recovery of both camera poses and scene structure and is capable of handling scenes with complex trajectories. We evaluate the performance of CT-NeRF on two real-world datasets, NeRFBuster and Free-Dataset, which feature complex trajectories. Results show CT-NeRF outperforms existing methods in novel view synthesis and pose estimation accuracy.
Paper Structure (17 sections, 10 equations, 19 figures, 6 tables)

This paper contains 17 sections, 10 equations, 19 figures, 6 tables.

Figures (19)

  • Figure 1: (a) Left: center-based pose graph to force pose consistent to the scene; Right: our pose graph to enable consistency between the camera poses in addition to the consistency to the scene. (b) The reprojection loss (bottom) provides an accurate gradient towards alignment, while the photometric loss (top) provides inconsistent gradients. (c) For a pair of correspondence $(q,p)$ between $I_1$ and $I_2$, $q$ is reprojected to the image plane of $I_2$ via the depth value of $q$. The reprojection loss is the geometric image distance between the reprojected point $p'$ and the ground truth corresponding point $p$.
  • Figure 2: Our incremental optimization pipeline for neural radiance fields and pose estimation.
  • Figure 3: Qualitative Comparison on Free-Dataset wang2023f2. Rendered views and depths (top left corner of each image)
  • Figure 4: Trajectory comparison. We visualize camera poses of both estimated (blue) and COLMAP (red). Sparse 3D points for the scenes are from COLMAP. While there are abrupt changes in the trajectories of BARF, L2G-NeRF, and Nope-NeRF, the changes are steady along the trajectories of CF-NeRF and ours. The bottom row shows rendered interframes between two frames of abrupt changes denoted by green rectangles.
  • Figure 5: Qualitative Comparison on NeRFbuster warburg2023nerfbusters. Rendered views and depths (top left corner of each image).
  • ...and 14 more figures