Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

Mingyu Xiao; Runze Chen; Haiyong Luo; Fang Zhao; Juan Wang; Xuepeng Ma

Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

Mingyu Xiao, Runze Chen, Haiyong Luo, Fang Zhao, Juan Wang, Xuepeng Ma

TL;DR

This work tackles map-free relocalization under monocular vision by introducing instance knowledge to constrain feature-point matching and depth knowledge to enable scale recovery. The method combines instance-level and global matching using SegGPT and DUSt3R, computes a rotation $R$ and a scale-free translation $\bar{t}$ from the essential matrix, and applies Metric3D-derived depth to recover a scaled translation $t = s\cdot\bar{t}$ through robust 3D correspondences and RANSAC. Ablation studies confirm that both instance knowledge and depth knowledge significantly improve rotation and translation accuracy, with their combination achieving state-of-the-art results on the map-free relocalization dataset. The approach advances map-free localization by reducing large matching errors and enabling accurate metric scale without pre-built maps, with broad implications for autonomous navigation and augmented reality.

Abstract

Map-free relocalization technology is crucial for applications in autonomous navigation and augmented reality, but relying on pre-built maps is often impractical. It faces significant challenges due to limitations in matching methods and the inherent lack of scale in monocular images. These issues lead to substantial rotational and metric errors and even localization failures in real-world scenarios. Large matching errors significantly impact the overall relocalization process, affecting both rotational and translational accuracy. Due to the inherent limitations of the camera itself, recovering the metric scale from a single image is crucial, as this significantly impacts the translation error. To address these challenges, we propose a map-free relocalization method enhanced by instance knowledge and depth knowledge. By leveraging instance-based matching information to improve global matching results, our method significantly reduces the possibility of mismatching across different objects. The robustness of instance knowledge across the scene helps the feature point matching model focus on relevant regions and enhance matching accuracy. Additionally, we use estimated metric depth from a single image to reduce metric errors and improve scale recovery accuracy. By integrating methods dedicated to mitigating large translational and rotational errors, our approach demonstrates superior performance in map-free relocalization techniques.

Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

TL;DR

and a scale-free translation

from the essential matrix, and applies Metric3D-derived depth to recover a scaled translation

through robust 3D correspondences and RANSAC. Ablation studies confirm that both instance knowledge and depth knowledge significantly improve rotation and translation accuracy, with their combination achieving state-of-the-art results on the map-free relocalization dataset. The approach advances map-free localization by reducing large matching errors and enabling accurate metric scale without pre-built maps, with broad implications for autonomous navigation and augmented reality.

Abstract

Paper Structure (11 sections, 7 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 7 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Method
Instance-enhanced Matching
Pose Estimation
Depth-enhanced Scale Recovery
Experiment
Main Results
Ablation Study
Instance Knowledge
Depth Knowledge
Conclusion

Figures (3)

Figure 1: The matched points foreground objects exhibit slightly higher confidence, but this preference is not significant.
Figure 2: Overview. Our model first takes two input images as input and obtains aligned point maps. Then, utilizing instance segmentation knowledge, feature points are matched both globally and within identified masks. The matched points are then input into an essential matrix solver, which computes the rotation matrix $R$ and a scale-free translation vector $\bar{t}$. Subsequently, depth knowledge is applied to project the feature points into 3D space, allowing for the recovery of a scaled translation vector $t$.
Figure 3: The CDF poses estimation errors across all scenes.

Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

TL;DR

Abstract

Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

Authors

TL;DR

Abstract

Table of Contents

Figures (3)