Table of Contents
Fetching ...

Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection

Reza Sadeghian, Niloofar Hooshyaripour, Chris Joslin, WonSook Lee

TL;DR

This paper tackles robust 3D object detection for autonomous driving under sensor malfunctions by proposing ReliFusion, a BEV-based fusion framework that combines Spatio-Temporal Feature Aggregation (STFA), a Cross-Modality Contrastive Learning–driven Reliability Module, and Confidence-Weighted Mutual Cross-Attention (CW-MCA) to adapt fusion to modality confidence. STFA captures both inter-view spatial dependencies and cross-time dynamics, the Reliability Module produces per-modality confidence scores to quantify data reliability, and CW-MCA uses these scores to dynamically balance LiDAR and camera information during fusion. On the nuScenes dataset, ReliFusion achieves state-of-the-art robustness and accuracy, particularly when LiDAR has a limited field of view or is degraded, outperforming BEVFusion and TransFusion in challenging conditions. The approach advances practical autonomous-driving perception by maintaining accurate BEV detections under adverse sensing scenarios and paves the way for further reliability-aware multimodal fusion research.

Abstract

Accurate and robust 3D object detection is essential for autonomous driving, where fusing data from sensors like LiDAR and camera enhances detection accuracy. However, sensor malfunctions such as corruption or disconnection can degrade performance, and existing fusion models often struggle to maintain reliability when one modality fails. To address this, we propose ReliFusion, a novel LiDAR-camera fusion framework operating in the bird's-eye view (BEV) space. ReliFusion integrates three key components: the Spatio-Temporal Feature Aggregation (STFA) module, which captures dependencies across frames to stabilize predictions over time; the Reliability module, which assigns confidence scores to quantify the dependability of each modality under challenging conditions; and the Confidence-Weighted Mutual Cross-Attention (CW-MCA) module, which dynamically balances information from LiDAR and camera modalities based on these confidence scores. Experiments on the nuScenes dataset show that ReliFusion significantly outperforms state-of-the-art methods, achieving superior robustness and accuracy in scenarios with limited LiDAR fields of view and severe sensor malfunctions.

Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection

TL;DR

This paper tackles robust 3D object detection for autonomous driving under sensor malfunctions by proposing ReliFusion, a BEV-based fusion framework that combines Spatio-Temporal Feature Aggregation (STFA), a Cross-Modality Contrastive Learning–driven Reliability Module, and Confidence-Weighted Mutual Cross-Attention (CW-MCA) to adapt fusion to modality confidence. STFA captures both inter-view spatial dependencies and cross-time dynamics, the Reliability Module produces per-modality confidence scores to quantify data reliability, and CW-MCA uses these scores to dynamically balance LiDAR and camera information during fusion. On the nuScenes dataset, ReliFusion achieves state-of-the-art robustness and accuracy, particularly when LiDAR has a limited field of view or is degraded, outperforming BEVFusion and TransFusion in challenging conditions. The approach advances practical autonomous-driving perception by maintaining accurate BEV detections under adverse sensing scenarios and paves the way for further reliability-aware multimodal fusion research.

Abstract

Accurate and robust 3D object detection is essential for autonomous driving, where fusing data from sensors like LiDAR and camera enhances detection accuracy. However, sensor malfunctions such as corruption or disconnection can degrade performance, and existing fusion models often struggle to maintain reliability when one modality fails. To address this, we propose ReliFusion, a novel LiDAR-camera fusion framework operating in the bird's-eye view (BEV) space. ReliFusion integrates three key components: the Spatio-Temporal Feature Aggregation (STFA) module, which captures dependencies across frames to stabilize predictions over time; the Reliability module, which assigns confidence scores to quantify the dependability of each modality under challenging conditions; and the Confidence-Weighted Mutual Cross-Attention (CW-MCA) module, which dynamically balances information from LiDAR and camera modalities based on these confidence scores. Experiments on the nuScenes dataset show that ReliFusion significantly outperforms state-of-the-art methods, achieving superior robustness and accuracy in scenarios with limited LiDAR fields of view and severe sensor malfunctions.

Paper Structure

This paper contains 24 sections, 16 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Illustration of ReliFusion's approach compared to previous methods. (a) Traditional methods rely on fixed fusion mechanisms, struggling under sensor malfunctions. (b) ReliFusion introduces a Reliability Module, incorporating contrastive and confidence modules, to assign confidence scores for LiDAR ($C_{\text{LiDAR}}$) and camera ($C_{\text{Camera}}$). These scores enable dynamic balancing of LiDAR and camera contributions within the fusion module, achieving robust and accurate 3D object detection even under challenging conditions.
  • Figure 2: The overal architecture of ReliFusion.
  • Figure 3: Qualitative detection results of BEVFusion and ReliFusion under LiDAR malfunctions scenarios. Clearly, BEVFusion struggles when LiDAR input is unavailable, whereas ReliFusion relies on camera to compensate and detect these objects. Green and Orange bounding boxes are true positive detection and ground truth, respectively.