NAP3D: NeRF Assisted 3D-3D Pose Alignment for Autonomous Vehicles
Gaurav Bansal
TL;DR
This work tackles pose drift in autonomous localization by introducing NAP3D, which uses NeRF-derived 3D-3D correspondences between the current depth view and a pre-trained NeRF to refine camera pose without relying on loop closure. The method combines NeRF rendering, depth-camera measurements, and Procrustes analysis to compute a rigid SE(3) transform that corrects accumulated error, with robustness enhanced by RANSAC-based outlier rejection. Experiments on a custom dataset and the TUM RGB-D benchmark show that NAP3D achieves sub-5 cm RMSE in controlled settings and consistently lowers 3D alignment RMSE relative to a 2D-3D PnP baseline, highlighting improved geometric consistency in 3D. The approach is lightweight and dataset-agnostic, offering a practical companion to SLAM pipelines when loop closure is unavailable and potentially extending to other radiance-field representations.
Abstract
Accurate localization is essential for autonomous vehicles, yet sensor noise and drift over time can lead to significant pose estimation errors, particularly in long-horizon environments. A common strategy for correcting accumulated error is visual loop closure in SLAM, which adjusts the pose graph when the agent revisits previously mapped locations. These techniques typically rely on identifying visual mappings between the current view and previously observed scenes and often require fusing data from multiple sensors. In contrast, this work introduces NeRF-Assisted 3D-3D Pose Alignment (NAP3D), a complementary approach that leverages 3D-3D correspondences between the agent's current depth image and a pre-trained Neural Radiance Field (NeRF). By directly aligning 3D points from the observed scene with synthesized points from the NeRF, NAP3D refines the estimated pose even from novel viewpoints, without relying on revisiting previously observed locations. This robust 3D-3D formulation provides advantages over conventional 2D-3D localization methods while remaining comparable in accuracy and applicability. Experiments demonstrate that NAP3D achieves camera pose correction within 5 cm on a custom dataset, robustly outperforming a 2D-3D Perspective-N-Point baseline. On TUM RGB-D, NAP3D consistently improves 3D alignment RMSE by approximately 6 cm compared to this baseline given varying noise, despite PnP achieving lower raw rotation and translation parameter error in some regimes, highlighting NAP3D's improved geometric consistency in 3D space. By providing a lightweight, dataset-agnostic tool, NAP3D complements existing SLAM and localization pipelines when traditional loop closure is unavailable.
