Table of Contents
Fetching ...

SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving

Gongjin Lan, Yang Peng, Qi Hao, Chengzhong Xu

TL;DR

This work addresses the challenge of data scarcity for autonomous driving in adverse conditions by proposing SUSTechGAN, a GAN-based framework that uses a dual attention module (PAM and CAM), multi-scale generators, and a detection-guided loss to generate driving images with strong local and global semantic cues. The loss blends $L_{det}$, $L_{adv}$, and $L_{cyc}$ as $L_{total}=k_1 L_{det}+k_2 L_{adv}+k_3 L_{cyc}$ with $k_1=0.8$, $k_2=1$, $k_3=10$, where $L_{det}=a L_{CIoU}+b L_{cls}+c L_{conf}$ and $a=0.4$, $b=0.3$, $c=0.3$, leveraging a pre-trained YOLOv5 detector. Empirically, SUSTechGAN outperforms CycleGAN, UNIT, and MUNIT in both image-quality metrics (FID/KID) and downstream object-detection performance (mAP) when augmenting training data for YOLOv5 on datasets such as BDD100k-adv, AllRain, and ACDC. The approach yields tangible improvements in detecting objects under rain and night conditions, demonstrating the practical value of detection-guided image synthesis for robust autonomous driving perception. The work contributes a validated framework and open-source resources to facilitate further research in adverse-condition image generation for autonomous driving.

Abstract

Autonomous driving significantly benefits from data-driven deep neural networks. However, the data in autonomous driving typically fits the long-tailed distribution, in which the critical driving data in adverse conditions is hard to collect. Although generative adversarial networks (GANs) have been applied to augment data for autonomous driving, generating driving images in adverse conditions is still challenging. In this work, we propose a novel framework, SUSTechGAN, with customized dual attention modules, multi-scale generators, and a novel loss function to generate driving images for improving object detection of autonomous driving in adverse conditions. We test the SUSTechGAN and the well-known GANs to generate driving images in adverse conditions of rain and night and apply the generated images to retrain object detection networks. Specifically, we add generated images into the training datasets to retrain the well-known YOLOv5 and evaluate the improvement of the retrained YOLOv5 for object detection in adverse conditions. The experimental results show that the generated driving images by our SUSTechGAN significantly improved the performance of retrained YOLOv5 in rain and night conditions, which outperforms the well-known GANs. The open-source code, video description and datasets are available on the page 1 to facilitate image generation development in autonomous driving under adverse conditions.

SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving

TL;DR

This work addresses the challenge of data scarcity for autonomous driving in adverse conditions by proposing SUSTechGAN, a GAN-based framework that uses a dual attention module (PAM and CAM), multi-scale generators, and a detection-guided loss to generate driving images with strong local and global semantic cues. The loss blends , , and as with , , , where and , , , leveraging a pre-trained YOLOv5 detector. Empirically, SUSTechGAN outperforms CycleGAN, UNIT, and MUNIT in both image-quality metrics (FID/KID) and downstream object-detection performance (mAP) when augmenting training data for YOLOv5 on datasets such as BDD100k-adv, AllRain, and ACDC. The approach yields tangible improvements in detecting objects under rain and night conditions, demonstrating the practical value of detection-guided image synthesis for robust autonomous driving perception. The work contributes a validated framework and open-source resources to facilitate further research in adverse-condition image generation for autonomous driving.

Abstract

Autonomous driving significantly benefits from data-driven deep neural networks. However, the data in autonomous driving typically fits the long-tailed distribution, in which the critical driving data in adverse conditions is hard to collect. Although generative adversarial networks (GANs) have been applied to augment data for autonomous driving, generating driving images in adverse conditions is still challenging. In this work, we propose a novel framework, SUSTechGAN, with customized dual attention modules, multi-scale generators, and a novel loss function to generate driving images for improving object detection of autonomous driving in adverse conditions. We test the SUSTechGAN and the well-known GANs to generate driving images in adverse conditions of rain and night and apply the generated images to retrain object detection networks. Specifically, we add generated images into the training datasets to retrain the well-known YOLOv5 and evaluate the improvement of the retrained YOLOv5 for object detection in adverse conditions. The experimental results show that the generated driving images by our SUSTechGAN significantly improved the performance of retrained YOLOv5 in rain and night conditions, which outperforms the well-known GANs. The open-source code, video description and datasets are available on the page 1 to facilitate image generation development in autonomous driving under adverse conditions.
Paper Structure (28 sections, 8 equations, 9 figures, 5 tables)

This paper contains 28 sections, 8 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The framework of our SUSTechGAN contains dual attention modules, multi-scale generators, and the components of the loss function. The dual attention module contains a position attention module (PAM, see \ref{['fig:pam']} for the detailed architecture) and a channel attention module (CAM, see \ref{['fig:cam']} for the detailed architecture). $Loss$, $L_{det}$, $L_{adv}$, and $L_{cyc}$ represent loss function, detection loss, adversarial loss, and consistency loss respectively
  • Figure 2: The framework of Position Attention Module (PAM) in our dual attention module.
  • Figure 3: The framework of Channel Attention Module (CAM).
  • Figure 4: The framework of multi-scale generators in our SUSTechGAN, where $k$ represents the kernel size and $s$ represents the convolution stride.
  • Figure 5: The feature results after position attention module and channel attention module.
  • ...and 4 more figures