Table of Contents
Fetching ...

Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

Qiyuan Shen, Hengwang Zhao, Weihao Yan, Chunxiang Wang, Tong Qin, Ming Yang

TL;DR

The paper tackles cross-modal visual relocalization by localizing a camera image within a prior LiDAR map using LiDAR intensity textures. It introduces a three-module pipeline: offline map projection to panoramic intensity images with an enhanced HEC projection, coarse retrieval with covisibility-based reordering, and fine relocalization via two-stage 2D-3D association, covisibility inlier selection, and PnP with RANSAC to recover a $6DoF$ pose. Key contributions include (1) intensity-driven map image databases for robust cross-modal matching, (2) covisibility clustering to reduce outliers in retrieval, and (3) a two-stage local feature matching regime that yields reliable 2D-3D correspondences for accurate pose estimation. Experiments on self-collected campus datasets show strong place recognition and pose estimation performance, illustrating the practical value for initializing SLAM and loop closure in cross-modal settings.

Abstract

Cross-modal localization has drawn increasing attention in recent years, while the visual relocalization in prior LiDAR maps is less studied. Related methods usually suffer from inconsistency between the 2D texture and 3D geometry, neglecting the intensity features in the LiDAR point cloud. In this paper, we propose a cross-modal visual relocalization system in prior LiDAR maps utilizing intensity textures, which consists of three main modules: map projection, coarse retrieval, and fine relocalization. In the map projection module, we construct the database of intensity channel map images leveraging the dense characteristic of panoramic projection. The coarse retrieval module retrieves the top-K most similar map images to the query image from the database, and retains the top-K' results by covisibility clustering. The fine relocalization module applies a two-stage 2D-3D association and a covisibility inlier selection method to obtain robust correspondences for 6DoF pose estimation. The experimental results on our self-collected datasets demonstrate the effectiveness in both place recognition and pose estimation tasks.

Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

TL;DR

The paper tackles cross-modal visual relocalization by localizing a camera image within a prior LiDAR map using LiDAR intensity textures. It introduces a three-module pipeline: offline map projection to panoramic intensity images with an enhanced HEC projection, coarse retrieval with covisibility-based reordering, and fine relocalization via two-stage 2D-3D association, covisibility inlier selection, and PnP with RANSAC to recover a pose. Key contributions include (1) intensity-driven map image databases for robust cross-modal matching, (2) covisibility clustering to reduce outliers in retrieval, and (3) a two-stage local feature matching regime that yields reliable 2D-3D correspondences for accurate pose estimation. Experiments on self-collected campus datasets show strong place recognition and pose estimation performance, illustrating the practical value for initializing SLAM and loop closure in cross-modal settings.

Abstract

Cross-modal localization has drawn increasing attention in recent years, while the visual relocalization in prior LiDAR maps is less studied. Related methods usually suffer from inconsistency between the 2D texture and 3D geometry, neglecting the intensity features in the LiDAR point cloud. In this paper, we propose a cross-modal visual relocalization system in prior LiDAR maps utilizing intensity textures, which consists of three main modules: map projection, coarse retrieval, and fine relocalization. In the map projection module, we construct the database of intensity channel map images leveraging the dense characteristic of panoramic projection. The coarse retrieval module retrieves the top-K most similar map images to the query image from the database, and retains the top-K' results by covisibility clustering. The fine relocalization module applies a two-stage 2D-3D association and a covisibility inlier selection method to obtain robust correspondences for 6DoF pose estimation. The experimental results on our self-collected datasets demonstrate the effectiveness in both place recognition and pose estimation tasks.

Paper Structure

This paper contains 19 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The cross-modal visual relocalization in prior LiDAR maps. Given a camera image as a query, the system aims to determine its 6DoF pose in the prior LiDAR map.
  • Figure 2: The hierarchical framework of proposed cross-modal visual relocalization system in prior LiDAR maps.
  • Figure 3: The consistency comparison of different projection models between equalized map image and grayscale image. The outlined area in (a) recognized as a road lamp, is sparse while the area in (b) is much more dense which is beneficial for the following modules.
  • Figure 4: The two-stage 2D-3D association results. After the covisibility clustering shown in the upper right corner, top-K' candidates are then matched with query image. The below part represents the first stage of association, while the upper left corner represents second stage results in bounding box.