Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge
Mingyu Xiao, Runze Chen, Haiyong Luo, Fang Zhao, Juan Wang, Xuepeng Ma
TL;DR
This work tackles map-free relocalization under monocular vision by introducing instance knowledge to constrain feature-point matching and depth knowledge to enable scale recovery. The method combines instance-level and global matching using SegGPT and DUSt3R, computes a rotation $R$ and a scale-free translation $\bar{t}$ from the essential matrix, and applies Metric3D-derived depth to recover a scaled translation $t = s\cdot\bar{t}$ through robust 3D correspondences and RANSAC. Ablation studies confirm that both instance knowledge and depth knowledge significantly improve rotation and translation accuracy, with their combination achieving state-of-the-art results on the map-free relocalization dataset. The approach advances map-free localization by reducing large matching errors and enabling accurate metric scale without pre-built maps, with broad implications for autonomous navigation and augmented reality.
Abstract
Map-free relocalization technology is crucial for applications in autonomous navigation and augmented reality, but relying on pre-built maps is often impractical. It faces significant challenges due to limitations in matching methods and the inherent lack of scale in monocular images. These issues lead to substantial rotational and metric errors and even localization failures in real-world scenarios. Large matching errors significantly impact the overall relocalization process, affecting both rotational and translational accuracy. Due to the inherent limitations of the camera itself, recovering the metric scale from a single image is crucial, as this significantly impacts the translation error. To address these challenges, we propose a map-free relocalization method enhanced by instance knowledge and depth knowledge. By leveraging instance-based matching information to improve global matching results, our method significantly reduces the possibility of mismatching across different objects. The robustness of instance knowledge across the scene helps the feature point matching model focus on relevant regions and enhance matching accuracy. Additionally, we use estimated metric depth from a single image to reduce metric errors and improve scale recovery accuracy. By integrating methods dedicated to mitigating large translational and rotational errors, our approach demonstrates superior performance in map-free relocalization techniques.
