Table of Contents
Fetching ...

NAP3D: NeRF Assisted 3D-3D Pose Alignment for Autonomous Vehicles

Gaurav Bansal

TL;DR

This work tackles pose drift in autonomous localization by introducing NAP3D, which uses NeRF-derived 3D-3D correspondences between the current depth view and a pre-trained NeRF to refine camera pose without relying on loop closure. The method combines NeRF rendering, depth-camera measurements, and Procrustes analysis to compute a rigid SE(3) transform that corrects accumulated error, with robustness enhanced by RANSAC-based outlier rejection. Experiments on a custom dataset and the TUM RGB-D benchmark show that NAP3D achieves sub-5 cm RMSE in controlled settings and consistently lowers 3D alignment RMSE relative to a 2D-3D PnP baseline, highlighting improved geometric consistency in 3D. The approach is lightweight and dataset-agnostic, offering a practical companion to SLAM pipelines when loop closure is unavailable and potentially extending to other radiance-field representations.

Abstract

Accurate localization is essential for autonomous vehicles, yet sensor noise and drift over time can lead to significant pose estimation errors, particularly in long-horizon environments. A common strategy for correcting accumulated error is visual loop closure in SLAM, which adjusts the pose graph when the agent revisits previously mapped locations. These techniques typically rely on identifying visual mappings between the current view and previously observed scenes and often require fusing data from multiple sensors. In contrast, this work introduces NeRF-Assisted 3D-3D Pose Alignment (NAP3D), a complementary approach that leverages 3D-3D correspondences between the agent's current depth image and a pre-trained Neural Radiance Field (NeRF). By directly aligning 3D points from the observed scene with synthesized points from the NeRF, NAP3D refines the estimated pose even from novel viewpoints, without relying on revisiting previously observed locations. This robust 3D-3D formulation provides advantages over conventional 2D-3D localization methods while remaining comparable in accuracy and applicability. Experiments demonstrate that NAP3D achieves camera pose correction within 5 cm on a custom dataset, robustly outperforming a 2D-3D Perspective-N-Point baseline. On TUM RGB-D, NAP3D consistently improves 3D alignment RMSE by approximately 6 cm compared to this baseline given varying noise, despite PnP achieving lower raw rotation and translation parameter error in some regimes, highlighting NAP3D's improved geometric consistency in 3D space. By providing a lightweight, dataset-agnostic tool, NAP3D complements existing SLAM and localization pipelines when traditional loop closure is unavailable.

NAP3D: NeRF Assisted 3D-3D Pose Alignment for Autonomous Vehicles

TL;DR

This work tackles pose drift in autonomous localization by introducing NAP3D, which uses NeRF-derived 3D-3D correspondences between the current depth view and a pre-trained NeRF to refine camera pose without relying on loop closure. The method combines NeRF rendering, depth-camera measurements, and Procrustes analysis to compute a rigid SE(3) transform that corrects accumulated error, with robustness enhanced by RANSAC-based outlier rejection. Experiments on a custom dataset and the TUM RGB-D benchmark show that NAP3D achieves sub-5 cm RMSE in controlled settings and consistently lowers 3D alignment RMSE relative to a 2D-3D PnP baseline, highlighting improved geometric consistency in 3D. The approach is lightweight and dataset-agnostic, offering a practical companion to SLAM pipelines when loop closure is unavailable and potentially extending to other radiance-field representations.

Abstract

Accurate localization is essential for autonomous vehicles, yet sensor noise and drift over time can lead to significant pose estimation errors, particularly in long-horizon environments. A common strategy for correcting accumulated error is visual loop closure in SLAM, which adjusts the pose graph when the agent revisits previously mapped locations. These techniques typically rely on identifying visual mappings between the current view and previously observed scenes and often require fusing data from multiple sensors. In contrast, this work introduces NeRF-Assisted 3D-3D Pose Alignment (NAP3D), a complementary approach that leverages 3D-3D correspondences between the agent's current depth image and a pre-trained Neural Radiance Field (NeRF). By directly aligning 3D points from the observed scene with synthesized points from the NeRF, NAP3D refines the estimated pose even from novel viewpoints, without relying on revisiting previously observed locations. This robust 3D-3D formulation provides advantages over conventional 2D-3D localization methods while remaining comparable in accuracy and applicability. Experiments demonstrate that NAP3D achieves camera pose correction within 5 cm on a custom dataset, robustly outperforming a 2D-3D Perspective-N-Point baseline. On TUM RGB-D, NAP3D consistently improves 3D alignment RMSE by approximately 6 cm compared to this baseline given varying noise, despite PnP achieving lower raw rotation and translation parameter error in some regimes, highlighting NAP3D's improved geometric consistency in 3D space. By providing a lightweight, dataset-agnostic tool, NAP3D complements existing SLAM and localization pipelines when traditional loop closure is unavailable.

Paper Structure

This paper contains 27 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Diagram illustrating outlined algorithm structure. System is structured into the Neural Radiance Field, Onboard Image, and Autonomous Agent processors, each of which control separate parts of the system's logic.
  • Figure 2: Images generated from the NeRF and RealSense, with corresponding keypoints indicating the epipole of each image. Point clouds indicating 3d coordinates of NeRF (red) and real-life (blue) keypoints are shown, with marked centroids for NeRF and RealSense in green and yellow respectively.
  • Figure 3: TUM RGB-D and NeRF RGB and Depth images at the ground truth position and perturbed position respectively. Note that while both depth images represent data fidelity, the TUM RGB-D depth image is formatted as to encode actual depth values in a 16-bit grayscale format.
  • Figure 4: Mean per-frame difference between PnP-RANSAC and NAP3D error (PnP - NAP3D). Positive values per cell imply smaller NAP3D error; Negative values imply larger.
  • Figure 5: Qualitative comparison of pose correction, in a frame with no blur or intrinsics noise applied. PnP-RANSAC's substantially higher rotational error leads to pronounced geometric misalignment and increased 3D RMSE, although this method achieves lower translational error. In contrast, NAP3D preserves rotational consistency and achieves significantly lower 3D alignment error (decrease by $\sim9.24$ cm), illustrating a failure mode of 2D-3D pose estimation that is mitigated by 3D-3D alignment.