EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera

Yuanchao Yue; Hui Yuan; Suai Li; Qi Jiang

EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera

Yuanchao Yue, Hui Yuan, Suai Li, Qi Jiang

TL;DR

An efficient edge pixel-based matching network (EEPNet), an advanced network that leverages reflectance maps obtained from point cloud projections to enhance registration accuracy and markedly accelerates real-time registration tasks.

Abstract

Multisensor fusion is essential for autonomous vehicles to accurately perceive, analyze, and plan their trajectories within complex environments. This typically involves the integration of data from LiDAR sensors and cameras, which necessitates high-precision and real-time registration. Current methods for registering LiDAR point clouds with images face significant challenges due to inherent modality differences and computational overhead. To address these issues, we propose EEPNet, an advanced network that leverages reflectance maps obtained from point cloud projections to enhance registration accuracy. The introduction of point cloud projections substantially mitigates cross-modality differences at the network input level, while the inclusion of reflectance data improves performance in scenarios with limited spatial information of point cloud within the camera's field of view. Furthermore, by employing edge pixels for feature matching and incorporating an efficient matching optimization layer, EEPNet markedly accelerates real-time registration tasks. Experimental validation demonstrates that EEPNet achieves superior accuracy and efficiency compared to state-of-the-art methods. Our contributions offer significant advancements in autonomous perception systems, paving the way for robust and efficient sensor fusion in real-world applications.

EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera

TL;DR

Abstract

Paper Structure (27 sections, 8 equations, 8 figures, 5 tables)

This paper contains 27 sections, 8 equations, 8 figures, 5 tables.

Introduction
Related Work
Same-Modality Registration
Cross-Modality Registration
Method
Reflectance Map
Feature Extraction Network
Edge Pixel Extraction
Feature Embedding
Matching Optimization Network
Loss Function
Transformation Matrix Estimation
Experiments
Dataset
Baselines and Metrics
...and 12 more sections

Figures (8)

Figure 1: Visualization depicting the processing stages involved in projecting a 3D point cloud into a reflectance map and the following processing. (a) Projection map obtained through correlation with the spherical coordinate $\theta$ and the vertical axis. (b) Projection map obtained through correlation with the LiDAR's LaserID on the vertical axis. (c) Post-processing of the projection map with wavelet filter, derived from (b). (d) Visualization of edge pixels extracted from (c) using the edge extraction algorithm.
Figure 2: Visualization of image processing on the camera image. (a) Single-channel grayscale image of the camera image's R channel. (b) Visualization of edge pixels obtained after edge detection in (a).
Figure 3: Neural network structure for feature extraction. The network takes reflectance map $\textbf{I}_r$ and camera image $\textbf{I}_c$ as inputs. Edge pixel extraction is performed on both inputs to obtain sequences of respective edge pixels. The output of this network architecture consists of features $\textbf{d}_r$ corresponding to edge pixels in the reflectance map and features $\textbf{d}_c$ corresponding to edge pixels in the camera image.
Figure 4: Structure of the Matching Optimization Network, illustrating the process from point features $\mathbf{d}_r$ and $\mathbf{d}_c$ to the correspondence decision via the partial assignment matrix $\mathbf{P}$. Linear blocks represent linear neural network layers with varying output dimensions, while Softmax operations along matrix rows and columns are indicated by differently colored arrows.
Figure 5: Detailed network structure of the feature extraction module in which the dimensions of the features are also indicated. Conv2D represents a 2D convolutional layer, BN denotes a batch normalization layer, and ReLU indicates the ReLU activation function.
...and 3 more figures

EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera

TL;DR

Abstract

EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera

Authors

TL;DR

Abstract

Table of Contents

Figures (8)