CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory
Yunlong Ran, Yanxu Li, Qi Ye, Yuchi Huo, Zechun Bai, Jiahao Sun, Jiming Chen
TL;DR
CT-NeRF tackles the challenge of reconstructing neural radiance fields from RGB sequences with unknown poses, especially along complex trajectories, by coupling an incremental NeRF optimization with a pose-graph framework and a reprojection-based geometric image distance derived from dense correspondences. The method integrates tracking, windowed local refinement, and global bundle adjustment, augmented by a differentiable reprojection loss that guides both pose and scene geometry optimization. Key contributions include a local-global BA structure with in-between pose edges and a reprojection constraint learned from correspondences, yielding improved pose accuracy and novel-view synthesis on challenging datasets. The approach demonstrates robust performance without depth input and without relying on high-quality initial poses, advancing NeRF-based reconstruction in real-world, complex-motion scenarios.
Abstract
Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address this limitation, we propose CT-NeRF, an incremental reconstruction optimization pipeline using only RGB images without pose and depth input. In this pipeline, we first propose a local-global bundle adjustment under a pose graph connecting neighboring frames to enforce the consistency between poses to escape the local minima caused by only pose consistency with the scene structure. Further, we instantiate the consistency between poses as a reprojected geometric image distance constraint resulting from pixel-level correspondences between input image pairs. Through the incremental reconstruction, CT-NeRF enables the recovery of both camera poses and scene structure and is capable of handling scenes with complex trajectories. We evaluate the performance of CT-NeRF on two real-world datasets, NeRFBuster and Free-Dataset, which feature complex trajectories. Results show CT-NeRF outperforms existing methods in novel view synthesis and pose estimation accuracy.
