Table of Contents
Fetching ...

CCSPNet-Joint: Efficient Joint Training Method for Traffic Sign Detection Under Extreme Conditions

Haoqin Hong, Yue Zhou, Xiangyu Shu, Xiaofang Hu

TL;DR

This work proposes CCSPNet, an efficient feature extraction module based on Contextual Transformer and CNN, capable of effectively utilizing the static and dynamic features of images, achieving faster inference speed and providing stronger feature enhancement capabilities.

Abstract

Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extraction module based on Contextual Transformer and CNN, capable of effectively utilizing the static and dynamic features of images, achieving faster inference speed and providing stronger feature enhancement capabilities. Furthermore, we establish the correlation between object detection and image denoising tasks and propose a joint training model, CCSPNet-Joint, to improve data efficiency and generalization. Finally, to validate our approach, we create the CCTSDB-AUG dataset for traffic sign detection in extreme scenarios. Extensive experiments have shown that CCSPNet achieves state-of-the-art performance in traffic sign detection under extreme conditions. Compared to end-to-end methods, CCSPNet-Joint achieves a 5.32% improvement in precision and an 18.09% improvement in mAP@.5.

CCSPNet-Joint: Efficient Joint Training Method for Traffic Sign Detection Under Extreme Conditions

TL;DR

This work proposes CCSPNet, an efficient feature extraction module based on Contextual Transformer and CNN, capable of effectively utilizing the static and dynamic features of images, achieving faster inference speed and providing stronger feature enhancement capabilities.

Abstract

Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extraction module based on Contextual Transformer and CNN, capable of effectively utilizing the static and dynamic features of images, achieving faster inference speed and providing stronger feature enhancement capabilities. Furthermore, we establish the correlation between object detection and image denoising tasks and propose a joint training model, CCSPNet-Joint, to improve data efficiency and generalization. Finally, to validate our approach, we create the CCTSDB-AUG dataset for traffic sign detection in extreme scenarios. Extensive experiments have shown that CCSPNet achieves state-of-the-art performance in traffic sign detection under extreme conditions. Compared to end-to-end methods, CCSPNet-Joint achieves a 5.32% improvement in precision and an 18.09% improvement in mAP@.5.
Paper Structure (17 sections, 8 equations, 6 figures, 3 tables)

This paper contains 17 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The feature enhancement module that we proposed: (a) Architecture of our feature enhancement module. (b) The structure of the CCSPNet. (c) The structure of the CoT.
  • Figure 2: The object detection model YOLO-CCSPNet in this article: (a) a framework for one-stage object detection, (b) a backbone network based on EfficientViT, and (c) a Neck module based on CCSPNet.
  • Figure 3: The proposed joint training method CCSPNet-Joint for traffic sign detection in extreme conditions.
  • Figure 4: Experimental dataset: (a) Original images from CCTSDB, (b-d) Augmented images from CCTSDB-AUG with rain, fog, and motion blur, (e) Images processed by 4kDehazing in CCTSDB-AUG for rain removal, fog removal, and motion blur removal, arranged from top to bottom.
  • Figure 5: The convergence speed and accuracy comparison of YOLO-CCSPNet and CCSPNet-Joint.
  • ...and 1 more figures