Table of Contents
Fetching ...

ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions

Shounak Sural, Nishad Sahu, Ragunathan Rajkumar

TL;DR

ContextualFusion addresses the robustness gap of multimodal 3D object detection in adverse weather and low-light conditions by introducing context-guided gating in a BEV fusion framework. The method leverages a GatedConvolutional Fusion mechanism that weighs camera and lidar features based on day/night and rain context, and introduces the AdverseOp3D CARLA-generated dataset to balance adverse-condition scenarios. Empirical results show substantial gains, including an 11.7% mAP improvement at night on NuScenes and a 6.2% improvement on AdverseOp3D, with competitive inference speed around 9.7 FPS. The work contributes a practical, context-aware fusion approach and provides an open synthetic dataset to advance robust, all-weather autonomous perception.

Abstract

The fusion of multimodal sensor data streams such as camera images and lidar point clouds plays an important role in the operation of autonomous vehicles (AVs). Robust perception across a range of adverse weather and lighting conditions is specifically required for AVs to be deployed widely. While multi-sensor fusion networks have been previously developed for perception in sunny and clear weather conditions, these methods show a significant degradation in performance under night-time and poor weather conditions. In this paper, we propose a simple yet effective technique called ContextualFusion to incorporate the domain knowledge about cameras and lidars behaving differently across lighting and weather variations into 3D object detection models. Specifically, we design a Gated Convolutional Fusion (GatedConv) approach for the fusion of sensor streams based on the operational context. To aid in our evaluation, we use the open-source simulator CARLA to create a multimodal adverse-condition dataset called AdverseOp3D to address the shortcomings of existing datasets being biased towards daytime and good-weather conditions. Our ContextualFusion approach yields an mAP improvement of 6.2% over state-of-the-art methods on our context-balanced synthetic dataset. Finally, our method enhances state-of-the-art 3D objection performance at night on the real-world NuScenes dataset with a significant mAP improvement of 11.7%.

ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions

TL;DR

ContextualFusion addresses the robustness gap of multimodal 3D object detection in adverse weather and low-light conditions by introducing context-guided gating in a BEV fusion framework. The method leverages a GatedConvolutional Fusion mechanism that weighs camera and lidar features based on day/night and rain context, and introduces the AdverseOp3D CARLA-generated dataset to balance adverse-condition scenarios. Empirical results show substantial gains, including an 11.7% mAP improvement at night on NuScenes and a 6.2% improvement on AdverseOp3D, with competitive inference speed around 9.7 FPS. The work contributes a practical, context-aware fusion approach and provides an open synthetic dataset to advance robust, all-weather autonomous perception.

Abstract

The fusion of multimodal sensor data streams such as camera images and lidar point clouds plays an important role in the operation of autonomous vehicles (AVs). Robust perception across a range of adverse weather and lighting conditions is specifically required for AVs to be deployed widely. While multi-sensor fusion networks have been previously developed for perception in sunny and clear weather conditions, these methods show a significant degradation in performance under night-time and poor weather conditions. In this paper, we propose a simple yet effective technique called ContextualFusion to incorporate the domain knowledge about cameras and lidars behaving differently across lighting and weather variations into 3D object detection models. Specifically, we design a Gated Convolutional Fusion (GatedConv) approach for the fusion of sensor streams based on the operational context. To aid in our evaluation, we use the open-source simulator CARLA to create a multimodal adverse-condition dataset called AdverseOp3D to address the shortcomings of existing datasets being biased towards daytime and good-weather conditions. Our ContextualFusion approach yields an mAP improvement of 6.2% over state-of-the-art methods on our context-balanced synthetic dataset. Finally, our method enhances state-of-the-art 3D objection performance at night on the real-world NuScenes dataset with a significant mAP improvement of 11.7%.
Paper Structure (18 sections, 1 equation, 8 figures, 3 tables)

This paper contains 18 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison of NuScenes nuscenes ground truth (top row) with predictions from existing state-of-the-art models (middle row) vs our ContextualFusion predictions (bottom row) on night-time NuScenes data
  • Figure 2: Bounding box ground truth generated from CARLA in lidar and camera views corresponding to the NuScenes sensor configurations. The image brightness is enhanced for better visibility.
  • Figure 3: The distribution of adverse operating conditions in our AdverseOp3D dataset in comparison to the NuScenes dataset.
  • Figure 4: Our ContextualFusion Model Architecture
  • Figure 5: Visualization of lidar and camera features before and after fusion
  • ...and 3 more figures