Table of Contents
Fetching ...

RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems

Yanlong Yang, Jianan Liu, Tao Huang, Qing-Long Han, Gang Ma, Bing Zhu

TL;DR

RaLiBEV tackles robust 3D perception for autonomous driving under adverse weather by proposing an anchor-box-free BEV fusion framework that integrates radar range–azimuth heatmaps with LiDAR point clouds. It introduces Gaussian Area-based Consistent Heatmap and IoU Cost Positive Sample (GACHIPS) for label assignment and a Dense Query Map-based Interactive BEV Fusion (DQMITBF) module for symmetric radar–LiDAR feature fusion. Through experiments on the Oxford Radar RobotCar dataset, RaLiBEV demonstrates substantial improvements over state-of-the-art fusion methods, including clear and foggy conditions, achieving higher AP at IoU thresholds up to 0.8. The approach advances real-time, weather-robust vehicle detection by effectively leveraging radar’s weather-penetrating properties and LiDAR’s geometric precision in a single BEV framework.

Abstract

In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. The existing works adopt convolutional neural network architecture to extract features from each sensor data, then align and aggregate the two branch features to predict object detection results. However, these methods have low accuracy of predicted bounding boxes due to a simple design of label assignment and fusion strategies. In this paper, we propose a bird's-eye view fusion learning-based anchor box-free object detection system, which fuses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector is further enhanced by employing a novel interactive transformer module. The superior performance of the methods proposed in this paper has been demonstrated using the recently published Oxford Radar RobotCar dataset. Our system's average precision significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under 'Clear+Foggy' training conditions for 'Clear' and 'Foggy' testing, respectively.

RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems

TL;DR

RaLiBEV tackles robust 3D perception for autonomous driving under adverse weather by proposing an anchor-box-free BEV fusion framework that integrates radar range–azimuth heatmaps with LiDAR point clouds. It introduces Gaussian Area-based Consistent Heatmap and IoU Cost Positive Sample (GACHIPS) for label assignment and a Dense Query Map-based Interactive BEV Fusion (DQMITBF) module for symmetric radar–LiDAR feature fusion. Through experiments on the Oxford Radar RobotCar dataset, RaLiBEV demonstrates substantial improvements over state-of-the-art fusion methods, including clear and foggy conditions, achieving higher AP at IoU thresholds up to 0.8. The approach advances real-time, weather-robust vehicle detection by effectively leveraging radar’s weather-penetrating properties and LiDAR’s geometric precision in a single BEV framework.

Abstract

In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. The existing works adopt convolutional neural network architecture to extract features from each sensor data, then align and aggregate the two branch features to predict object detection results. However, these methods have low accuracy of predicted bounding boxes due to a simple design of label assignment and fusion strategies. In this paper, we propose a bird's-eye view fusion learning-based anchor box-free object detection system, which fuses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector is further enhanced by employing a novel interactive transformer module. The superior performance of the methods proposed in this paper has been demonstrated using the recently published Oxford Radar RobotCar dataset. Our system's average precision significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under 'Clear+Foggy' training conditions for 'Clear' and 'Foggy' testing, respectively.
Paper Structure (21 sections, 9 equations, 6 figures, 4 tables)

This paper contains 21 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Performance of the RaLiBEV in clear and foggy weather. The LiDAR and radar visualization results are combined by radar range azimuth heatmap with jet pseudo-color in the background and LiDAR with white point cloud and 2D object bounding boxes. The ground-truth boxes are orange, and the predicted boxes by RaLiBEV are red. All the boxes use green lines to indicate the heading direction.
  • Figure 2: The entire pipeline of proposed radar and LiDAR fusion-based anchor box free object detector, RaLiBEV.
  • Figure 3: Overview of label assignment strategies for object detection. (a) Identifies the object with a red-bordered ground-truth bounding box and a green headline. Yellow and red ellipses represent ground-truth and predicted Gaussian distributions, respectively. Red dots are positive sample points, and green dots mark the ground-truth Gaussian centers. Strategies (b) to (e) show different methods for selecting positive samples for box loss calculation, ranging from (b) using all anchor points within the ground-truth area, to (c) selecting the center, (d) the point with the highest foreground-background classification score, and (e) the point with the highest "score plus IoU". Strategy (f) integrates the approach from (e) with an alternative loss function as Eq. (\ref{['focal_loss_modifed_in_GACHIPS']}).
  • Figure 4: Comparison of direct transformer for BEV fusion with dense query map-based interactive transformer for BEV fusion.
  • Figure 5: Interactive transformer feature map visualization results.
  • ...and 1 more figures