Table of Contents
Fetching ...

SemOD: Semantic Enabled Object Detection Network under Various Weather Conditions

Aiyinsi Zuo, Zhaoliang Zheng

TL;DR

SemOD addresses the challenge of robust camera-based object detection under diverse weather by integrating semantic priors into both image restoration and detection stages. The two-unit architecture—PPU for weather-aware image refinement and DTU for semantically informed detection—employs an Attention Embedded Decoder and a Domain Adaptation Block to fuse semantic maps (via HRNet) with a YOLO-based detector. The approach achieves consistent improvements in COCO-style mAP across fog, rain, snow, and clear conditions, with notable gains in snowy scenes (up to 8.8 percentage points) and maintains real-time-ish inference on standard GPUs. By open-sourcing datasets and code, the work provides a practical, semantically guided framework for all-weather perception in autonomous driving.

Abstract

In the field of autonomous driving, camera-based perception models are mostly trained on clear weather data. Models that focus on addressing specific weather challenges are unable to adapt to various weather changes and primarily prioritize their weather removal characteristics. Our study introduces a semantic-enabled network for object detection in diverse weather conditions. In our analysis, semantics information can enable the model to generate plausible content for missing areas, understand object boundaries, and preserve visual coherency and realism across both filled-in and existing portions of the image, which are conducive to image transformation and object recognition. Specific in implementation, our architecture consists of a Preprocessing Unit (PPU) and a Detection Unit (DTU), where the PPU utilizes a U-shaped net enriched by semantics to refine degraded images, and the DTU integrates this semantic information for object detection using a modified YOLO network. Our method pioneers the use of semantic data for all-weather transformations, resulting in an increase between 1.47\% to 8.80\% in mAP compared to existing methods across benchmark datasets of different weather. This highlights the potency of semantics in image enhancement and object detection, offering a comprehensive approach to improving object detection performance. Code will be available at https://github.com/EnisZuo/SemOD.

SemOD: Semantic Enabled Object Detection Network under Various Weather Conditions

TL;DR

SemOD addresses the challenge of robust camera-based object detection under diverse weather by integrating semantic priors into both image restoration and detection stages. The two-unit architecture—PPU for weather-aware image refinement and DTU for semantically informed detection—employs an Attention Embedded Decoder and a Domain Adaptation Block to fuse semantic maps (via HRNet) with a YOLO-based detector. The approach achieves consistent improvements in COCO-style mAP across fog, rain, snow, and clear conditions, with notable gains in snowy scenes (up to 8.8 percentage points) and maintains real-time-ish inference on standard GPUs. By open-sourcing datasets and code, the work provides a practical, semantically guided framework for all-weather perception in autonomous driving.

Abstract

In the field of autonomous driving, camera-based perception models are mostly trained on clear weather data. Models that focus on addressing specific weather challenges are unable to adapt to various weather changes and primarily prioritize their weather removal characteristics. Our study introduces a semantic-enabled network for object detection in diverse weather conditions. In our analysis, semantics information can enable the model to generate plausible content for missing areas, understand object boundaries, and preserve visual coherency and realism across both filled-in and existing portions of the image, which are conducive to image transformation and object recognition. Specific in implementation, our architecture consists of a Preprocessing Unit (PPU) and a Detection Unit (DTU), where the PPU utilizes a U-shaped net enriched by semantics to refine degraded images, and the DTU integrates this semantic information for object detection using a modified YOLO network. Our method pioneers the use of semantic data for all-weather transformations, resulting in an increase between 1.47\% to 8.80\% in mAP compared to existing methods across benchmark datasets of different weather. This highlights the potency of semantics in image enhancement and object detection, offering a comprehensive approach to improving object detection performance. Code will be available at https://github.com/EnisZuo/SemOD.

Paper Structure

This paper contains 21 sections, 8 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overall Model Structure Given an input degraded image, it goes into pre-trained segmentation prior module for learning semantic information, and then our pre-processing unit(PPU) will combine both information from pre-trained segmentation prior module and PPU to generate a degraded image. The degraded image is then fed into the Object Detection Unit(DTU) for detection mission. ⓒ represents feature map concatenation. "c" stands for channel.
  • Figure 2: Innovative Module Structure We selectively visualize the modules that mark the key innovation within our model: attention embedded decoder(AED) in the Pre-Process Unit and domain adaptation block(DAB) in the Detection Unit. In the figure,"BN" stands for batch normalization.
  • Figure 3: Qualitative visualization results are presented for the validation set of four distinct weather datasets. The first row depicts conditions during foggy weather, the second row shows rainy weather, the third row illustrates snowy weather, and the final row portrays sunny weather. (a) Original dataset images. (b) Suboptimal model(Transweather+Yolo$_{v11}$) output results. (c) Our method's results. Note that both(b)(c) include weather removal effect and detection results. All the solid line bounding boxes are the final detection results and different colors represent different classes. The highlighted dashed red bounding boxes are the wrong detection results compared with (c).