Depth-aware Fusion Method based on Image and 4D Radar Spectrum for 3D Object Detection
Yue Sun, Yeqiang Qian, Chunxiang Wang, Ming Yang
TL;DR
The paper addresses robust 3D object detection for autonomous driving under adverse weather by fusing depth-aware camera images with 4D millimeter-wave radar spectra. It introduces a BEV fusion framework with polar-aligned attention that combines depth-enriched image features and radar spectral features, and a GAN-based depth generator to synthesize depth maps from radar spectra when depth sensors are unavailable. It employs multi-scale feature extraction, a compact detection head with Hungarian loss, and demonstrates improvements on the K-Radar dataset, outperforming radar-point-cloud baselines while reducing network complexity. The work advances all-weather perception by leveraging complementary sensing modalities in a cost-effective, robust pipeline, with future directions in radar data preprocessing and improved depth generation.
Abstract
Safety and reliability are crucial for the public acceptance of autonomous driving. To ensure accurate and reliable environmental perception, intelligent vehicles must exhibit accuracy and robustness in various environments. Millimeter-wave radar, known for its high penetration capability, can operate effectively in adverse weather conditions such as rain, snow, and fog. Traditional 3D millimeter-wave radars can only provide range, Doppler, and azimuth information for objects. Although the recent emergence of 4D millimeter-wave radars has added elevation resolution, the radar point clouds remain sparse due to Constant False Alarm Rate (CFAR) operations. In contrast, cameras offer rich semantic details but are sensitive to lighting and weather conditions. Hence, this paper leverages these two highly complementary and cost-effective sensors, 4D millimeter-wave radar and camera. By integrating 4D radar spectra with depth-aware camera images and employing attention mechanisms, we fuse texture-rich images with depth-rich radar data in the Bird's Eye View (BEV) perspective, enhancing 3D object detection. Additionally, we propose using GAN-based networks to generate depth images from radar spectra in the absence of depth sensors, further improving detection accuracy.
