Marrying NeRF with Feature Matching for One-step Pose Estimation

Ronghan Chen; Yang Cong; Yu Ren

Marrying NeRF with Feature Matching for One-step Pose Estimation

Ronghan Chen, Yang Cong, Yu Ren

TL;DR

This work addresses real-time, CAD-free object pose estimation by integrating image matching with Neural Radiance Fields (NeRF). By rendering a NeRF from an initial pose, extracting 2D-2D matches via LoFTR, lifting to 3D using NeRF depth, and solving pose with PnP+RANSAC in one step, the method achieves fast, robust estimates and avoids lengthy optimization or heavy training for novel objects. To improve reliability, it introduces 3D consistent point mining to discard unreliable NeRF-derived points and a keypoint-guided occlusion-robust refinement to mitigate occlusion effects; experiments show up to 90× efficiency gains and real-time 6 FPS performance, with strong accuracy on synthetic and real datasets. The approach offers practical benefits for robotics and AR by delivering CAD-free, data-efficient pose estimation with improved occlusion handling and robustness.

Abstract

Given the image collection of an object, we aim at building a real-time image-based pose estimation method, which requires neither its CAD model nor hours of object-specific training. Recent NeRF-based methods provide a promising solution by directly optimizing the pose from pixel loss between rendered and target images. However, during inference, they require long converging time, and suffer from local minima, making them impractical for real-time robot applications. We aim at solving this problem by marrying image matching with NeRF. With 2D matches and depth rendered by NeRF, we directly solve the pose in one step by building 2D-3D correspondences between target and initial view, thus allowing for real-time prediction. Moreover, to improve the accuracy of 2D-3D correspondences, we propose a 3D consistent point mining strategy, which effectively discards unfaithful points reconstruted by NeRF. Moreover, current NeRF-based methods naively optimizing pixel loss fail at occluded images. Thus, we further propose a 2D matches based sampling strategy to preclude the occluded area. Experimental results on representative datasets prove that our method outperforms state-of-the-art methods, and improves inference efficiency by 90x, achieving real-time prediction at 6 FPS.

Marrying NeRF with Feature Matching for One-step Pose Estimation

TL;DR

Abstract

Paper Structure (27 sections, 7 equations, 6 figures, 2 tables)

This paper contains 27 sections, 7 equations, 6 figures, 2 tables.

Introduction
Related Works
Deep Learning Based Pose Estimation
Render-and-compare Based Pose Estimation
Keypoint-Matching-Based Pose Estimation
Background
Method
One-step Pose Estimation via Feature Matching
Matching
Lifting
PnP
3D Consistent Point Mining
Keypoint-guided Occlusion Robust Refinement
Experiments
Comparison Methods
...and 12 more sections

Figures (6)

Figure 1: Given an object image with unknown pose, we propose a NeRF-based pose estimation method, which reduces the hundreds of optimization steps in former NeRF-based method to only one step, while avoiding being stuck in local minima, and obtaining more accurate poses. As a result, with only 5 minutes training of a fast NeRF instant-ngp, our method achieves CAD model-free real-time pose estimation on novel objects at 6FPS.
Figure 2: Framework of the one-step pose estimation via feature matching strategy. Given the initial pose, we use NeRF instant-ngp to render an RGB image $I_r$, and a depth image $D$. Then, an off-the-shelf image matcher loftr is applied to generate 2D-2D matches between the rendered and target image. Given location of matched 2D points and its depth rendered by NeRF, the 3D coordinates can be obtained, thus forming 2D-3D matches, from which the pose is finally solved via PnP+RANSAC.
Figure 3: Qualitative results of pose estimation on NeRF synthetic nerf and real-world LLFF dataset llff. We visualize the results by overlying the target image and NeRF rendering image from the estimated pose.
Figure 4: Qualitative results of pose estimation on synthesized occluded data. The comparison methods fail to align the occluded images after hundreds of iterations, while our method aligns well in one step.
Figure 5: Visualization of the points discarded by 3D consistent point mining strategy.
...and 1 more figures

Marrying NeRF with Feature Matching for One-step Pose Estimation

TL;DR

Abstract

Marrying NeRF with Feature Matching for One-step Pose Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)