Table of Contents
Fetching ...

Evaluating the Impact of Weather-Induced Sensor Occlusion on BEVFusion for 3D Object Detection

Sanjay Kumar, Tim Brophy, Eoin Martino Grua, Ganesh Sistu, Valentina Donzella, Ciaran Eising

TL;DR

This work evaluates how weather- or obstruction-induced occlusions affect BEVFusion-based 3D object detection on nuScenes. It introduces controlled occlusions by applying Woodscape-derived soiling masks to camera images and random point dropout to LiDAR, assessing performance with mean Average Precision ($mAP$) and nuScenes Detection Score ($NDS$) without retraining. The study finds that camera occlusion predominantly degrades camera-only performance, while LiDAR occlusion severely hinders LiDAR-only accuracy; in fusion, camera occlusion yields a small drop, whereas LiDAR occlusion causes a larger decline, underscoring LiDAR reliance in 3D localization. The results motivate occlusion-aware evaluation and future fusion strategies, including occlusion-aware training and temporal reasoning to preserve detection accuracy under partial sensor failure or adverse weather.

Abstract

Accurate 3D object detection is essential for automated vehicles to navigate safely in complex real-world environments. Bird's Eye View (BEV) representations, which project multi-sensor data into a top-down spatial format, have emerged as a powerful approach for robust perception. Although BEV-based fusion architectures have demonstrated strong performance through multimodal integration, the effects of sensor occlusions, caused by environmental conditions such as fog, haze, or physical obstructions, on 3D detection accuracy remain underexplored. In this work, we investigate the impact of occlusions on both camera and Light Detection and Ranging (LiDAR) outputs using the BEVFusion architecture, evaluated on the nuScenes dataset. Detection performance is measured using mean Average Precision (mAP) and the nuScenes Detection Score (NDS). Our results show that moderate camera occlusions lead to a 41.3% drop in mAP (from 35.6% to 20.9%) when detection is based only on the camera. On the other hand, LiDAR sharply drops in performance only under heavy occlusion, with mAP falling by 47.3% (from 64.7% to 34.1%), with a severe impact on long-range detection. In fused settings, the effect depends on which sensor is occluded: occluding the camera leads to a minor 4.1% drop (from 68.5% to 65.7%), while occluding LiDAR results in a larger 26.8% drop (to 50.1%), revealing the model's stronger reliance on LiDAR for the task of 3D object detection. Our results highlight the need for future research into occlusion-aware evaluation methods and improved sensor fusion techniques that can maintain detection accuracy in the presence of partial sensor failure or degradation due to adverse environmental conditions.

Evaluating the Impact of Weather-Induced Sensor Occlusion on BEVFusion for 3D Object Detection

TL;DR

This work evaluates how weather- or obstruction-induced occlusions affect BEVFusion-based 3D object detection on nuScenes. It introduces controlled occlusions by applying Woodscape-derived soiling masks to camera images and random point dropout to LiDAR, assessing performance with mean Average Precision () and nuScenes Detection Score () without retraining. The study finds that camera occlusion predominantly degrades camera-only performance, while LiDAR occlusion severely hinders LiDAR-only accuracy; in fusion, camera occlusion yields a small drop, whereas LiDAR occlusion causes a larger decline, underscoring LiDAR reliance in 3D localization. The results motivate occlusion-aware evaluation and future fusion strategies, including occlusion-aware training and temporal reasoning to preserve detection accuracy under partial sensor failure or adverse weather.

Abstract

Accurate 3D object detection is essential for automated vehicles to navigate safely in complex real-world environments. Bird's Eye View (BEV) representations, which project multi-sensor data into a top-down spatial format, have emerged as a powerful approach for robust perception. Although BEV-based fusion architectures have demonstrated strong performance through multimodal integration, the effects of sensor occlusions, caused by environmental conditions such as fog, haze, or physical obstructions, on 3D detection accuracy remain underexplored. In this work, we investigate the impact of occlusions on both camera and Light Detection and Ranging (LiDAR) outputs using the BEVFusion architecture, evaluated on the nuScenes dataset. Detection performance is measured using mean Average Precision (mAP) and the nuScenes Detection Score (NDS). Our results show that moderate camera occlusions lead to a 41.3% drop in mAP (from 35.6% to 20.9%) when detection is based only on the camera. On the other hand, LiDAR sharply drops in performance only under heavy occlusion, with mAP falling by 47.3% (from 64.7% to 34.1%), with a severe impact on long-range detection. In fused settings, the effect depends on which sensor is occluded: occluding the camera leads to a minor 4.1% drop (from 68.5% to 65.7%), while occluding LiDAR results in a larger 26.8% drop (to 50.1%), revealing the model's stronger reliance on LiDAR for the task of 3D object detection. Our results highlight the need for future research into occlusion-aware evaluation methods and improved sensor fusion techniques that can maintain detection accuracy in the presence of partial sensor failure or degradation due to adverse environmental conditions.

Paper Structure

This paper contains 28 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: From left to right: the original nuScenes image, followed by multiple binary occlusion patterns from the WoodScape soiling dataset. A randomly selected binary mask is smoothed using a Gaussian filter and applied to the original image to produce an occluded nuScenes image.
  • Figure 2: From left to right: original nuScenes point clouds, simulated environmental conditions (e.g., rain, fog, snow), a random sampling step that removes some points, and the resulting occluded point clouds.
  • Figure 3: Overview of the BEVFusion architecture used in our study. From left to right: multi-view camera and inputs (including occluded variants) are encoded and transformed into space, followed by sensor fusion, encoding, and 3D object detection under occlusion.
  • Figure 4: Qualitative comparison of BEVFusion predictions under occlusion. From top to bottom: (a) Ground truth, (b) Prediction with clean camera input, (c) Prediction with clean input, (d) Prediction with occluded , and (e) Prediction with clean camera + occluded . For each row, the left images show the multi-view camera inputs, with the 3D detection predictions overlaid. The right image displays the input scan, with the predictions overlaid in the space.
  • Figure 5: Qualitative comparison of BEVFusion predictions under camera occlusion. (a) Ground truth, (b) Prediction with clean camera, (c) Prediction with occluded camera, and (d) Prediction with occluded camera + clean . For each row, the left images show the multi-view camera inputs, with the 3D detection predictions overlaid. The right image displays the input scan, with the predictions overlaid in the space.
  • ...and 1 more figures