Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures
Qiyuan Shen, Hengwang Zhao, Weihao Yan, Chunxiang Wang, Tong Qin, Ming Yang
TL;DR
The paper tackles cross-modal visual relocalization by localizing a camera image within a prior LiDAR map using LiDAR intensity textures. It introduces a three-module pipeline: offline map projection to panoramic intensity images with an enhanced HEC projection, coarse retrieval with covisibility-based reordering, and fine relocalization via two-stage 2D-3D association, covisibility inlier selection, and PnP with RANSAC to recover a $6DoF$ pose. Key contributions include (1) intensity-driven map image databases for robust cross-modal matching, (2) covisibility clustering to reduce outliers in retrieval, and (3) a two-stage local feature matching regime that yields reliable 2D-3D correspondences for accurate pose estimation. Experiments on self-collected campus datasets show strong place recognition and pose estimation performance, illustrating the practical value for initializing SLAM and loop closure in cross-modal settings.
Abstract
Cross-modal localization has drawn increasing attention in recent years, while the visual relocalization in prior LiDAR maps is less studied. Related methods usually suffer from inconsistency between the 2D texture and 3D geometry, neglecting the intensity features in the LiDAR point cloud. In this paper, we propose a cross-modal visual relocalization system in prior LiDAR maps utilizing intensity textures, which consists of three main modules: map projection, coarse retrieval, and fine relocalization. In the map projection module, we construct the database of intensity channel map images leveraging the dense characteristic of panoramic projection. The coarse retrieval module retrieves the top-K most similar map images to the query image from the database, and retains the top-K' results by covisibility clustering. The fine relocalization module applies a two-stage 2D-3D association and a covisibility inlier selection method to obtain robust correspondences for 6DoF pose estimation. The experimental results on our self-collected datasets demonstrate the effectiveness in both place recognition and pose estimation tasks.
