MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection

Till Beemelmanns; Quan Zhang; Christian Geller; Lutz Eckstein

MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection

Till Beemelmanns, Quan Zhang, Christian Geller, Lutz Eckstein

TL;DR

The paper addresses the vulnerability of LiDAR-camera fusion for 3D object detection under real-world corruptions by introducing MultiCorrupt, a comprehensive, open-source benchmark with ten corruption types applied to LiDAR and camera data across three severity levels. It defines robust evaluation metrics, including Resistance Ability ($RA$), mean Resistance Ability ($mRA$), and Relative Resistance Ability ($RRA$), to quantify performance degradation relative to a clean baseline. Through extensive benchmarking of five state-of-the-art detectors (e.g., CMT, SparseFusion, TransFusion, DeepInteraction, BEVFusion) on nuScenes, the study reveals that robustness is highly dependent on fusion strategy, alignment, and training regime, with certain designs like independent modality handling and masked-modal training offering enhanced resilience. The findings provide practical guidance for developing robust multi-modal perception systems and establish a foundational, open benchmark for ongoing research in robust 3D perception for autonomous driving.

Abstract

Multi-modal 3D object detection models for automated driving have demonstrated exceptional performance on computer vision benchmarks like nuScenes. However, their reliance on densely sampled LiDAR point clouds and meticulously calibrated sensor arrays poses challenges for real-world applications. Issues such as sensor misalignment, miscalibration, and disparate sampling frequencies lead to spatial and temporal misalignment in data from LiDAR and cameras. Additionally, the integrity of LiDAR and camera data is often compromised by adverse environmental conditions such as inclement weather, leading to occlusions and noise interference. To address this challenge, we introduce MultiCorrupt, a comprehensive benchmark designed to evaluate the robustness of multi-modal 3D object detectors against ten distinct types of corruptions. We evaluate five state-of-the-art multi-modal detectors on MultiCorrupt and analyze their performance in terms of their resistance ability. Our results show that existing methods exhibit varying degrees of robustness depending on the type of corruption and their fusion strategy. We provide insights into which multi-modal design choices make such models robust against certain perturbations. The dataset generation code and benchmark are open-sourced at https://github.com/ika-rwth-aachen/MultiCorrupt.

MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection

TL;DR

), mean Resistance Ability (

), and Relative Resistance Ability (

), to quantify performance degradation relative to a clean baseline. Through extensive benchmarking of five state-of-the-art detectors (e.g., CMT, SparseFusion, TransFusion, DeepInteraction, BEVFusion) on nuScenes, the study reveals that robustness is highly dependent on fusion strategy, alignment, and training regime, with certain designs like independent modality handling and masked-modal training offering enhanced resilience. The findings provide practical guidance for developing robust multi-modal perception systems and establish a foundational, open benchmark for ongoing research in robust 3D perception for autonomous driving.

Abstract

Paper Structure (14 sections, 3 equations, 4 figures, 4 tables)

This paper contains 14 sections, 3 equations, 4 figures, 4 tables.

Introduction
Related Work
Monocular 3D Object Detection
Multi-View 3D Object Detection
Multi-Modal 3D Object Detection
Robustness of 3D Object Detection
Method
MultiCorrupt: Multi-Modal Corrupted Dataset
Evaluation metrics
Benchmarking Existing Multi-Modal 3D Object Detectors
Evaluation
Resistance Ability & Relative Resistance Ability
Analysis
Conclusion

Figures (4)

Figure 1: MultiCorrupt: A benchmark of state-of-the-art LiDAR-camera 3D detection methods under corruption. (\ref{['fig:multicorrupt_overview']}) We introduce ten different multi-modal corruptions and (\ref{['fig:preview_benchmark']}) provide a comprehensive benchmark and analysis of top-performing detection models under these data perturbations.
Figure 2: Visualization of corrupted LiDAR and camera data. (\ref{['fig:fog_severity_level_1']})-(\ref{['fig:fog_severity_level_3']}) We display corrupted sensor data for Fog wherein the maximum range and intensity of the LiDAR, as well as the camera image quality, degrades progressively with higher severity levels. (\ref{['fig:motionblur_severity_level_1']})-(\ref{['fig:motionblur_severity_level_3']}) The occurrence of Motion Blur impacts both the camera and LiDAR, potentially arising from motion, vibration and the rolling shutter effect of sensors.
Figure 3: Robustness for all corruptions and severity levels.$\text{RA}_{c,s}$ for different severity levels computed using NDS score.
Figure 4: Relative robustness visualization. RRA$_c$ computed with NDS using BEVfusion mitliu2022bevfusion as baseline.

MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection

TL;DR

Abstract

MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)