Table of Contents
Fetching ...

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

Zijian Gu, Jianwei Ma, Yan Huang, Honghao Wei, Zhanye Chen, Hui Zhang, Wei Hong

TL;DR

HGSFusion tackles radar sparsity and DOA estimation errors in autonomous driving by introducing RHGM to generate denser, semantically guided radar points and DSM to propagate radar position into image features while adaptively weighting modalities. The system fuses radar BEV features with image BEV features through a Spatial Sync that injects spatial priors and a Modality Sync that optimizes the contribution of each modality, yielding robust BEV representations for 3D detection. Extensive experiments on VoD and TJ4DRadSet show state-of-the-art gains in RoI AP and BEV AP, with ablations validating the effectiveness of hybrid point generation, separated encoding, and the dual-sync fusion strategy. The approach demonstrates strong performance across lighting conditions and object distances, highlighting its practical potential for robust, weather-robust multimodal perception in autonomous driving.

Abstract

Millimeter-wave radar plays a vital role in 3D object detection for autonomous driving due to its all-weather and all-lighting-condition capabilities for perception. However, radar point clouds suffer from pronounced sparsity and unavoidable angle estimation errors. To address these limitations, incorporating a camera may partially help mitigate the shortcomings. Nevertheless, the direct fusion of radar and camera data can lead to negative or even opposite effects due to the lack of depth information in images and low-quality image features under adverse lighting conditions. Hence, in this paper, we present the radar-camera fusion network with Hybrid Generation and Synchronization (HGSFusion), designed to better fuse radar potentials and image features for 3D object detection. Specifically, we propose the Radar Hybrid Generation Module (RHGM), which fully considers the Direction-Of-Arrival (DOA) estimation errors in radar signal processing. This module generates denser radar points through different Probability Density Functions (PDFs) with the assistance of semantic information. Meanwhile, we introduce the Dual Sync Module (DSM), comprising spatial sync and modality sync, to enhance image features with radar positional information and facilitate the fusion of distinct characteristics in different modalities. Extensive experiments demonstrate the effectiveness of our approach, outperforming the state-of-the-art methods in the VoD and TJ4DRadSet datasets by $6.53\%$ and $2.03\%$ in RoI AP and BEV AP, respectively. The code is available at https://github.com/garfield-cpp/HGSFusion.

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

TL;DR

HGSFusion tackles radar sparsity and DOA estimation errors in autonomous driving by introducing RHGM to generate denser, semantically guided radar points and DSM to propagate radar position into image features while adaptively weighting modalities. The system fuses radar BEV features with image BEV features through a Spatial Sync that injects spatial priors and a Modality Sync that optimizes the contribution of each modality, yielding robust BEV representations for 3D detection. Extensive experiments on VoD and TJ4DRadSet show state-of-the-art gains in RoI AP and BEV AP, with ablations validating the effectiveness of hybrid point generation, separated encoding, and the dual-sync fusion strategy. The approach demonstrates strong performance across lighting conditions and object distances, highlighting its practical potential for robust, weather-robust multimodal perception in autonomous driving.

Abstract

Millimeter-wave radar plays a vital role in 3D object detection for autonomous driving due to its all-weather and all-lighting-condition capabilities for perception. However, radar point clouds suffer from pronounced sparsity and unavoidable angle estimation errors. To address these limitations, incorporating a camera may partially help mitigate the shortcomings. Nevertheless, the direct fusion of radar and camera data can lead to negative or even opposite effects due to the lack of depth information in images and low-quality image features under adverse lighting conditions. Hence, in this paper, we present the radar-camera fusion network with Hybrid Generation and Synchronization (HGSFusion), designed to better fuse radar potentials and image features for 3D object detection. Specifically, we propose the Radar Hybrid Generation Module (RHGM), which fully considers the Direction-Of-Arrival (DOA) estimation errors in radar signal processing. This module generates denser radar points through different Probability Density Functions (PDFs) with the assistance of semantic information. Meanwhile, we introduce the Dual Sync Module (DSM), comprising spatial sync and modality sync, to enhance image features with radar positional information and facilitate the fusion of distinct characteristics in different modalities. Extensive experiments demonstrate the effectiveness of our approach, outperforming the state-of-the-art methods in the VoD and TJ4DRadSet datasets by and in RoI AP and BEV AP, respectively. The code is available at https://github.com/garfield-cpp/HGSFusion.

Paper Structure

This paper contains 31 sections, 11 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Illustration of angle estimation errors in obtaining radar point clouds. (a) True points and estimated points are shown in the image. (b) True points and estimated points are shown in the radar DOA estimation. The estimated points fall on the beamforming peaks, deviating from the true points.
  • Figure 2: Overall framework of the proposed HGSFusion. In the radar branch, the RHGM utilizes raw radar points and images to generate hybrid radar points (generated points, foreground points, and raw radar points shown in green, orange, and blue points, respectively). Then the hybrid radar points are encoded and passed through the radar backbone to produce radar BEV features. In the image branch, images are processed through image backbone and view transformation, producing image BEV features. Subsequently in DSM, the image and radar features undergo dual sync to obtain fused BEV features for object detection.
  • Figure 3: Point cloud generation in RHGM. Initially, raw radar points are projected onto the image, and points falling inside the mask are selected as foreground points. Subsequently, these foreground points are used to produce a generation probability distribution. Finally, the probability distribution is utilized to create the hybrid radar points composed of raw radar points (points in/out mask), foreground points, and generated Gaussian/uniform points.
  • Figure 4: Different encoding strategies of RHGM. Generated and foreground points share the same encoding scheme.
  • Figure 5: Internal structure of DSM. In Spatial Sync, radar features enhance image features with position information in radar features. Then the enhanced image features and radar features undergo Modality Sync, resulting in the fused BEV features.
  • ...and 3 more figures