ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Weidong Xie; Lun Luo; Nanfei Ye; Yi Ren; Shaoyi Du; Minhang Wang; Jintao Xu; Rui Ai; Weihao Gu; Xieyuanli Chen

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

TL;DR

This work introduces a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors and proposes an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images.

Abstract

Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

TL;DR

Abstract

Paper Structure (18 sections, 4 equations, 11 figures, 5 tables)

This paper contains 18 sections, 4 equations, 11 figures, 5 tables.

Introduction
Related Work
Visual Place Recognition (VPR)
LiDAR Place Recognition (LPR)
Image to Point Cloud Place Recognition
Image-to-pointcloud Place Recognition based on FoV Transformation
FoV Transformation
Novel NMF-Based Encoder Network
Training And Inference
Experiments
Datasets
Evaluation Metrics
Implementation Setup
Performance on the KITTI
Performance on the HAOMO dataset
...and 3 more sections

Figures (11)

Figure 1: The purpose of ModaLink is to retrieve the most positive corresponding point cloud of the query image in a pre-built large-scale point cloud database.
Figure 2: The training framework of our proposed ModaLink framework. Point clouds are converted to depth images via projection. Then, the query image and depth images are cropped into the same overlap of FoV. Based on the depth completion module, sparse depth images are upsampled to dense depth images. Then, global place descriptors are generated by a shared-weight encoder. Finally, we adopt triplet loss for supervision.
Figure 3: Visualization of image and point cloud alignment based on intrinsic and extrinsic matrix. After alignment, we crop the images based on FoV overlap and complete sparse depth images to generate dense depth images.
Figure 4: Three cases of depth image completion. The red point represents a pixel on the image plane where no LiDAR point is projected.
Figure 5: A demonstration of depth image interpolation.
...and 6 more figures

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

TL;DR

Abstract

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (11)