Table of Contents
Fetching ...

TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

Jinjie Mai, Wenxuan Zhu, Sara Rojas, Jesus Zarzar, Abdullah Hamdi, Guocheng Qian, Bing Li, Silvio Giancola, Bernard Ghanem

TL;DR

TrackNeRF addresses the challenge of training neural radiance fields from sparse and noisy views by introducing feature tracks and a global track reprojection loss inspired by bundle adjustment. It combines track extraction and refinement (Track Adjustment), a track-level reprojection objective, and depth regularization into NeRF training to enforce holistic multiview geometry. The approach yields state-of-the-art results on DTU and competitive gains on LLFF, with faster convergence and greater robustness to pose noise, especially as view count grows. This method provides a practical route to high-fidelity novel view synthesis in real-world, imperfect data collection scenarios.

Abstract

Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy. Previous solutions for learning NeRFs with sparse views and noisy poses only consider local geometry consistency with pairs of views. Closely following \textit{bundle adjustment} in Structure-from-Motion (SfM), we introduce TrackNeRF for more globally consistent geometry reconstruction and more accurate pose optimization. TrackNeRF introduces \textit{feature tracks}, \ie connected pixel trajectories across \textit{all} visible views that correspond to the \textit{same} 3D points. By enforcing reprojection consistency among feature tracks, TrackNeRF encourages holistic 3D consistency explicitly. Through extensive experiments, TrackNeRF sets a new benchmark in noisy and sparse view reconstruction. In particular, TrackNeRF shows significant improvements over the state-of-the-art BARF and SPARF by $\sim8$ and $\sim1$ in terms of PSNR on DTU under various sparse and noisy view setups. The code is available at \href{https://tracknerf.github.io/}.

TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

TL;DR

TrackNeRF addresses the challenge of training neural radiance fields from sparse and noisy views by introducing feature tracks and a global track reprojection loss inspired by bundle adjustment. It combines track extraction and refinement (Track Adjustment), a track-level reprojection objective, and depth regularization into NeRF training to enforce holistic multiview geometry. The approach yields state-of-the-art results on DTU and competitive gains on LLFF, with faster convergence and greater robustness to pose noise, especially as view count grows. This method provides a practical route to high-fidelity novel view synthesis in real-world, imperfect data collection scenarios.

Abstract

Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy. Previous solutions for learning NeRFs with sparse views and noisy poses only consider local geometry consistency with pairs of views. Closely following \textit{bundle adjustment} in Structure-from-Motion (SfM), we introduce TrackNeRF for more globally consistent geometry reconstruction and more accurate pose optimization. TrackNeRF introduces \textit{feature tracks}, \ie connected pixel trajectories across \textit{all} visible views that correspond to the \textit{same} 3D points. By enforcing reprojection consistency among feature tracks, TrackNeRF encourages holistic 3D consistency explicitly. Through extensive experiments, TrackNeRF sets a new benchmark in noisy and sparse view reconstruction. In particular, TrackNeRF shows significant improvements over the state-of-the-art BARF and SPARF by and in terms of PSNR on DTU under various sparse and noisy view setups. The code is available at \href{https://tracknerf.github.io/}.
Paper Structure (30 sections, 9 equations, 9 figures, 10 tables)

This paper contains 30 sections, 9 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 2: Illustration of Track Reprojection Loss.Left: Pairwise correspondence objective employed by CorresNeRF corresnerf and SPARF sparf2023. Right: Feature tracks objective proposed by TrackNeRF. TrackNeRF minimizes the reprojection loss across all visible views for feature tracks corresponding to the same landmarks.
  • Figure 3: Quanlitative Comparison on DTU dtu and LLFF llff. We show views from the test view split of both datasets to visually compare our TrackNeRF renderings to the baselines. For DTU dataset where GT depth maps are available, we additionally visualize the rendered depth by Eq. \ref{['eq:rendered-depth']} to compare the learned geometry.
  • Figure 4: Visualization of Feature Tracks. In almost all scenes, we can always find dense enough and accurate correspondence like the one example from DTU scan 21 shown in Fig. \ref{['fig:track_vis_a']}. We provide a rare case (scan 30) in Fig. \ref{['fig:track_vis_b']} where the correspondence network pdcnet++ fails to find enough reliable correspondences, in which cases we uses a lower $\lambda_{Track}$ for better performance.
  • Figure 5: Comparison on the Convergence of Pose Optimization. We show convergence plots of BARF barf, SPARF sparf2023 and our TrackNeRF on DTU and LLFF datasets. For a fair comparison, we keep sampling the same number of rays for each iteration as SPARF sparf2023. Plots with white background and gray background represent rotation and translation errors, respectively. Our TrackNeRF converges faster and to a lower loss than the state-of-the-art.
  • Figure I: Visualization of Camera Poses under 15% of Noise, where teal and purple tetrahedrons indicate recovered and ground-truth camera pose, respectively. While BARF barf can't recover the camera poses well, both SPARF sparf2023 and our TrackNeRF can recover near perfect camera poses under $15\%$ of noise.
  • ...and 4 more figures