Table of Contents
Fetching ...

LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds

Masahiko Tsuji, Hitoshi Niigaki, Ryuichi Tanida

TL;DR

This work tackles visual localization by constructing dense, accurate $3$D reference maps and using direct 2D-3D matching to estimate the camera pose with $PnP$ in a $3$D map. LiM-Loc directly assigns 2D keypoints to $3$D LiDAR points, avoiding traditional feature matching and applying Hidden Point Removal with spherical shell compression to prune occlusions, while Reference Image Reduction reduces map-generation time. The approach yields a dense reference map, improves 2D-3D inliers, and demonstrates accuracy improvements across indoor and outdoor datasets for multiple local features. The results show errors of only a few centimeters and faster map generation, highlighting practical benefits for autonomous systems and robotics.

Abstract

Visual localization is to estimate the 6-DOF camera pose of a query image in a 3D reference map. We extract keypoints from the reference image and generate a 3D reference map with 3D reconstruction of the keypoints in advance. We emphasize that the more keypoints in the 3D reference map and the smaller the error of the 3D positions of the keypoints, the higher the accuracy of the camera pose estimation. However, previous image-only methods require a huge number of images, and it is difficult to 3D-reconstruct keypoints without error due to inevitable mismatches and failures in feature matching. As a result, the 3D reference map is sparse and inaccurate. In contrast, accurate 3D reference maps can be generated by combining images and 3D sensors. Recently, 3D-LiDAR has been widely used around the world. LiDAR, which measures a large space with high density, has become inexpensive. In addition, accurately calibrated cameras are also widely used, so images that record the external parameters of the camera without errors can be easily obtained. In this paper, we propose a method to directly assign 3D LiDAR point clouds to keypoints to generate dense and accurate 3D reference maps. The proposed method avoids feature matching and achieves accurate 3D reconstruction for almost all keypoints. To estimate camera pose over a wide area, we use the wide-area LiDAR point cloud to remove points that are not visible to the camera and reduce 2D-3D correspondence errors. Using indoor and outdoor datasets, we apply the proposed method to several state-of-the-art local features and confirm that it improves the accuracy of camera pose estimation.

LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds

TL;DR

This work tackles visual localization by constructing dense, accurate D reference maps and using direct 2D-3D matching to estimate the camera pose with in a D map. LiM-Loc directly assigns 2D keypoints to D LiDAR points, avoiding traditional feature matching and applying Hidden Point Removal with spherical shell compression to prune occlusions, while Reference Image Reduction reduces map-generation time. The approach yields a dense reference map, improves 2D-3D inliers, and demonstrates accuracy improvements across indoor and outdoor datasets for multiple local features. The results show errors of only a few centimeters and faster map generation, highlighting practical benefits for autonomous systems and robotics.

Abstract

Visual localization is to estimate the 6-DOF camera pose of a query image in a 3D reference map. We extract keypoints from the reference image and generate a 3D reference map with 3D reconstruction of the keypoints in advance. We emphasize that the more keypoints in the 3D reference map and the smaller the error of the 3D positions of the keypoints, the higher the accuracy of the camera pose estimation. However, previous image-only methods require a huge number of images, and it is difficult to 3D-reconstruct keypoints without error due to inevitable mismatches and failures in feature matching. As a result, the 3D reference map is sparse and inaccurate. In contrast, accurate 3D reference maps can be generated by combining images and 3D sensors. Recently, 3D-LiDAR has been widely used around the world. LiDAR, which measures a large space with high density, has become inexpensive. In addition, accurately calibrated cameras are also widely used, so images that record the external parameters of the camera without errors can be easily obtained. In this paper, we propose a method to directly assign 3D LiDAR point clouds to keypoints to generate dense and accurate 3D reference maps. The proposed method avoids feature matching and achieves accurate 3D reconstruction for almost all keypoints. To estimate camera pose over a wide area, we use the wide-area LiDAR point cloud to remove points that are not visible to the camera and reduce 2D-3D correspondence errors. Using indoor and outdoor datasets, we apply the proposed method to several state-of-the-art local features and confirm that it improves the accuracy of camera pose estimation.

Paper Structure

This paper contains 15 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: LiM-Loc directly assigns 2D keypoints to 3D LiDAR point clouds to generate a dense and accurate 3D reference map. This method collaborates with a variety of state-of-the-art local features to 3D reconstruct almost every keypoint. LiM-Loc estimates with an error of less than a few centimeters, which is difficult with image-only methods.
  • Figure 2: LiM-Loc pipeline consists of (i) reprojecting the LiDAR point cloud onto a realistic LiDAR virtual image, (ii) extracting local image features, and (iii) directly assigning 2D keypoints to the LiDAR virtual image by overlapping them.
  • Figure 3: (a) Conventional HPR, which assumes the object-scale, has difficulty in handling spatial-scale point clouds. We compress the spatial-scale point cloud into a spherical shell and convert it into an object-scale point cloud by preserving the visibility from the camera. (b)Without HPR, hidden points appear as noise, which leads to misassignment in 2D-3D correspondence.
  • Figure 4: We measured outdoor and indoor datasets using LiDAR while walking. Equipped with a perfectly calibrated camera, we were able to easily capture images with extrinsic parameters.
  • Figure 5: The impact of (a) reducing the number of keypoints and (b) increasing the 3D position error of keypoints is seen both indoors and outdoors. When the number of keypoints is reduced by 80% accuracy atarts to degrade. When the 3D position error is larger than 0.10m, estimation below 0.10m become difficult.
  • ...and 1 more figures