REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction
Chaehee Song, Sanmin Kim, Hyeonjun Jeong, Juyeb Shin, Joonhee Lim, Dongsuk Kum
TL;DR
This work tackles the sparsity and noise of radar data in 3D occupancy prediction by enriching radar features through two modules: a Radar Densifier that redistributes features to neighboring regions using distance-weighted sharing modulated by radar cross-section, and a Radar Amplifier that emphasizes informative channels via an MLP-based weighting scheme. The enriched radar features are fused with multi-view camera BEV features using cross-modal attention and then lifted to 3D via a height-reprojection step for occupancy prediction, all without LiDAR supervision. On Occ3D-nuScenes, REOcc achieves a $mIoU$ of 45.33, outperforming camera-only baselines and previous fusion methods, with particularly large gains for dynamic objects ($ ext{Δ}mIoU_d$ up to 6.46). This demonstrates that proper radar feature enrichment unlocks the full potential of camera-radar fusion for robust and reliable 3D scene understanding in adverse environments.
Abstract
Vision-based 3D occupancy prediction has made significant advancements, but its reliance on cameras alone struggles in challenging environments. This limitation has driven the adoption of sensor fusion, among which camera-radar fusion stands out as a promising solution due to their complementary strengths. However, the sparsity and noise of the radar data limits its effectiveness, leading to suboptimal fusion performance. In this paper, we propose REOcc, a novel camera-radar fusion network designed to enrich radar feature representations for 3D occupancy prediction. Our approach introduces two main components, a Radar Densifier and a Radar Amplifier, which refine radar features by integrating spatial and contextual information, effectively enhancing spatial density and quality. Extensive experiments on the Occ3D-nuScenes benchmark demonstrate that REOcc achieves significant performance gains over the camera-only baseline model, particularly in dynamic object classes. These results underscore REOcc's capability to mitigate the sparsity and noise of the radar data. Consequently, radar complements camera data more effectively, unlocking the full potential of camera-radar fusion for robust and reliable 3D occupancy prediction.
