Table of Contents
Fetching ...

REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction

Chaehee Song, Sanmin Kim, Hyeonjun Jeong, Juyeb Shin, Joonhee Lim, Dongsuk Kum

TL;DR

This work tackles the sparsity and noise of radar data in 3D occupancy prediction by enriching radar features through two modules: a Radar Densifier that redistributes features to neighboring regions using distance-weighted sharing modulated by radar cross-section, and a Radar Amplifier that emphasizes informative channels via an MLP-based weighting scheme. The enriched radar features are fused with multi-view camera BEV features using cross-modal attention and then lifted to 3D via a height-reprojection step for occupancy prediction, all without LiDAR supervision. On Occ3D-nuScenes, REOcc achieves a $mIoU$ of 45.33, outperforming camera-only baselines and previous fusion methods, with particularly large gains for dynamic objects ($ ext{Δ}mIoU_d$ up to 6.46). This demonstrates that proper radar feature enrichment unlocks the full potential of camera-radar fusion for robust and reliable 3D scene understanding in adverse environments.

Abstract

Vision-based 3D occupancy prediction has made significant advancements, but its reliance on cameras alone struggles in challenging environments. This limitation has driven the adoption of sensor fusion, among which camera-radar fusion stands out as a promising solution due to their complementary strengths. However, the sparsity and noise of the radar data limits its effectiveness, leading to suboptimal fusion performance. In this paper, we propose REOcc, a novel camera-radar fusion network designed to enrich radar feature representations for 3D occupancy prediction. Our approach introduces two main components, a Radar Densifier and a Radar Amplifier, which refine radar features by integrating spatial and contextual information, effectively enhancing spatial density and quality. Extensive experiments on the Occ3D-nuScenes benchmark demonstrate that REOcc achieves significant performance gains over the camera-only baseline model, particularly in dynamic object classes. These results underscore REOcc's capability to mitigate the sparsity and noise of the radar data. Consequently, radar complements camera data more effectively, unlocking the full potential of camera-radar fusion for robust and reliable 3D occupancy prediction.

REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction

TL;DR

This work tackles the sparsity and noise of radar data in 3D occupancy prediction by enriching radar features through two modules: a Radar Densifier that redistributes features to neighboring regions using distance-weighted sharing modulated by radar cross-section, and a Radar Amplifier that emphasizes informative channels via an MLP-based weighting scheme. The enriched radar features are fused with multi-view camera BEV features using cross-modal attention and then lifted to 3D via a height-reprojection step for occupancy prediction, all without LiDAR supervision. On Occ3D-nuScenes, REOcc achieves a of 45.33, outperforming camera-only baselines and previous fusion methods, with particularly large gains for dynamic objects ( up to 6.46). This demonstrates that proper radar feature enrichment unlocks the full potential of camera-radar fusion for robust and reliable 3D scene understanding in adverse environments.

Abstract

Vision-based 3D occupancy prediction has made significant advancements, but its reliance on cameras alone struggles in challenging environments. This limitation has driven the adoption of sensor fusion, among which camera-radar fusion stands out as a promising solution due to their complementary strengths. However, the sparsity and noise of the radar data limits its effectiveness, leading to suboptimal fusion performance. In this paper, we propose REOcc, a novel camera-radar fusion network designed to enrich radar feature representations for 3D occupancy prediction. Our approach introduces two main components, a Radar Densifier and a Radar Amplifier, which refine radar features by integrating spatial and contextual information, effectively enhancing spatial density and quality. Extensive experiments on the Occ3D-nuScenes benchmark demonstrate that REOcc achieves significant performance gains over the camera-only baseline model, particularly in dynamic object classes. These results underscore REOcc's capability to mitigate the sparsity and noise of the radar data. Consequently, radar complements camera data more effectively, unlocking the full potential of camera-radar fusion for robust and reliable 3D occupancy prediction.

Paper Structure

This paper contains 16 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of our LiDAR-free REOcc. Existing approaches addressing radar's inherent sparse and noisy characteristics rely on LiDAR-based supervision. In contrast, our method achieves radar data processing using radar features alone, without supplementary sensors.
  • Figure 2: Overall architecture of REOcc. The input data from multi-view camera images are processed through a 2D backbone and a view transformation to extract BEV features. In parallel, radar point clouds are fed into a separate 2D backbone to generate radar pillar features. Radar Densifier and Amplifier are then employed to improve both the quantity and quality of radar features, addressing the inherent sparse and noisy characteristics of radar data. Subsequently, the BEV features from the image and radar are fused and then lifted into the 3D volume with additional height information. Finally, an occupancy head is used to predict occupancy from the generated fused voxel features.
  • Figure 3: Detailed structure of the proposed Radar Densifier.
  • Figure 4: Illustration of RCS-based distribution. A Gaussian distribution reflecting the RCS is employed to account for the object's size, as the RCS provides a measure of the object's reflective surface area.
  • Figure 5: Detailed structure of the proposed Radar Amplifier.
  • ...and 2 more figures