Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Zhipeng Zhao; Huai Yu; Chenwei Lyv; Wen Yang; Sebastian Scherer

Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Zhipeng Zhao, Huai Yu, Chenwei Lyv, Wen Yang, Sebastian Scherer

TL;DR

This work proposes an end-to-end learnable network to conduct cross-modal visual localization by establishing similarity in high-dimensional feature space, inspired by the attention mechanism, and optimize the network to capture the salient feature for comparing images and point clouds.

Abstract

Visual localization plays an important role for intelligent robots and autonomous driving, especially when the accuracy of GNSS is unreliable. Recently, camera localization in LiDAR maps has attracted more and more attention for its low cost and potential robustness to illumination and weather changes. However, the commonly used pinhole camera has a narrow Field-of-View, thus leading to limited information compared with the omni-directional LiDAR data. To overcome this limitation, we focus on correlating the information of 360 equirectangular images to point clouds, proposing an end-to-end learnable network to conduct cross-modal visual localization by establishing similarity in high-dimensional feature space. Inspired by the attention mechanism, we optimize the network to capture the salient feature for comparing images and point clouds. We construct several sequences containing 360 equirectangular images and corresponding point clouds based on the KITTI-360 dataset and conduct extensive experiments. The results demonstrate the effectiveness of our approach.

Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

TL;DR

Abstract

Paper Structure (26 sections, 10 equations, 7 figures, 4 tables)

This paper contains 26 sections, 10 equations, 7 figures, 4 tables.

Introduction
Related Work
Image based retrieval
Point cloud based retrieval
Cross-modal based retrieval
Approach
Overview
The Network Architecture
Feature Extraction
Image Feature Extraction
Point Cloud Feature Extraction
Attention Enhancement
Global Description Aggregation
Metric Learning
Implementation Detail
...and 11 more sections

Figures (7)

Figure 1: Comparison of spherical images and perspective images with point cloud counterparts. The Right side shows the point clouds corresponding to the images on the Left, which were obtained at the same location.
Figure 2: A schematic of the cross-modal localization. The localization is performed by comparing the query 360 image with the point clouds sub-maps from the global map and then finding the closest sub-map to determine the location.
Figure 3: The Architecture of our Model for Cross-modal Localization. The inputs are the 360 image and the point cloud sub-map.
Figure 4: The results of cross-modal localization. The second and third rows show the recall@top1 point cloud sub-map retrieved through ResNet-based Baseline and AE-Spherical Model with the 360 image, where the green frame indicates a correct result and the red frame indicates an incorrect result.
Figure 5: Recall@k measure of AE-Spherical Model and ResNet-based Baseline for four tasks including same-modal localization (a), (b) and cross-modal localization (c), (d).
...and 2 more figures

Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

TL;DR

Abstract

Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Authors

TL;DR

Abstract

Table of Contents

Figures (7)