LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection

Jingyu Song; Lingjun Zhao; Katherine A. Skinner

LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection

Jingyu Song, Lingjun Zhao, Katherine A. Skinner

TL;DR

LiRaFusion addresses the performance gap in LiDAR-radar fusion for 3D object detection by introducing an early fusion voxel feature encoder and a channel-wise adaptive gated middle fusion. The architecture enables robust cross-modality feature extraction and fusion within a BEV framework, and it demonstrates superior performance on the nuScenes dataset against existing LR and LCR methods, with notable gains at long range and under rain. It also shows that LiDAR and radar modalities can be effectively combined with a gated fusion strategy, and that the approach is extendable to LiDAR-camera-radar fusion. The work highlights practical impact for safer autonomous driving under adverse conditions and offers a reusable fusion backbone for multi-modality detectors.

Abstract

We propose LiRaFusion to tackle LiDAR-radar fusion for 3D object detection to fill the performance gap of existing LiDAR-radar detectors. To improve the feature extraction capabilities from these two modalities, we design an early fusion module for joint voxel feature encoding, and a middle fusion module to adaptively fuse feature maps via a gated network. We perform extensive evaluation on nuScenes to demonstrate that LiRaFusion leverages the complementary information of LiDAR and radar effectively and achieves notable improvement over existing methods.

LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection

TL;DR

Abstract

Paper Structure (15 sections, 5 figures, 8 tables)

This paper contains 15 sections, 5 figures, 8 tables.

Introduction
Related Work
Radar Datasets for Autonomous Driving
Multi-modality 3D Object Detection
Gated Network for Sensor Fusion
Method
Early Fusion
Middle Fusion
Experiments
Experiment Design
Results and Comparison
Qualitative Results
Ablation Studies
LiDAR-Camera-Radar Fusion
Conclusion

Figures (5)

Figure 1: We propose LiRaFusion to efficiently leverage the complementary information of LiDAR and radar for 3D object detection.
Figure 2: Overview of the architecture of LiRaFusion. Our main contributions, shown as bold text, mainly include a joint voxel feature encoder to extract per-voxel features from the stacked point cloud, and a gated network to learn weights for each input feature map to fuse them adaptively.
Figure 3: The network architecture early fusion module. We stack the loaded LiDAR and radar points by zero-padding them to the same number of dimensions before feeding into the proposed joint voxel feature encoder.
Figure 4: The network architecture for the middle fusion module. In this module, by applying a channel-wise convolution and a sigmoid function to the concatenated LiDAR-radar feature map, the network generates adaptive weights for LiDAR and radar separately. Then the input LiDAR and radar feature maps are element-wise multiplied with the weights before being concatenated as a fused LiDAR-radar feature map.
Figure 5: Example bounding box predictions and corresponding weight maps. We present two frames in which LiRaFusion correctly detects a car (highlighted with a red circle) that is missed by the baseline LO detector. We also show a zoomed-in view in which we label radar points in magenta, and LiDAR points in gray or red (if they reside in a bounding box). We show ground truth bounding boxes in blue and predictions in green. In the visualization of weight maps, the black bounding box with arrow denotes the ego-vehicle. Boxes without an arrow denote the highlighted missed car object. Best viewed in color and zoomed-in.

LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection

TL;DR

Abstract

LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)