EMDFNet: Efficient Multi-scale and Diverse Feature Network for Traffic Sign Detection
Pengyu Li, Chenhe Liu, Tengfei Li, Xinyu Wang, Shihui Zhang, Dongyang Yu
TL;DR
EMDFNet tackles feature singularity and weak multi-scale fusion in traffic sign detection by introducing an Augmented Shortcut Module (ASM) and an Efficient Hybrid Encoder (EHE) built atop a Res2Net backbone and trained with SIoU loss. The approach diversifies feature representations and enhances cross-scale fusion, enabling robust small-object detection while preserving real-time, single-stage inference. Extensive experiments on TT100K and GTSDB demonstrate state-of-the-art $mAP$ and strong $AP_s$, with competitive $FPS$ and reduced parameters. The results underscore the effectiveness of diversified feature pathways and cross-scale integration for reliable TSD in complex driving scenes, and point toward further lightweight optimizations for practical deployment.
Abstract
The detection of small objects, particularly traffic signs, is a critical subtask within object detection and autonomous driving. Despite the notable achievements in previous research, two primary challenges persist. Firstly, the main issue is the singleness of feature extraction. Secondly, the detection process fails to effectively integrate with objects of varying sizes or scales. These issues are also prevalent in generic object detection. Motivated by these challenges, in this paper, we propose a novel object detection network named Efficient Multi-scale and Diverse Feature Network (EMDFNet) for traffic sign detection that integrates an Augmented Shortcut Module and an Efficient Hybrid Encoder to address the aforementioned issues simultaneously. Specifically, the Augmented Shortcut Module utilizes multiple branches to integrate various spatial semantic information and channel semantic information, thereby enhancing feature diversity. The Efficient Hybrid Encoder utilizes global feature fusion and local feature interaction based on various features to generate distinctive classification features by integrating feature information in an adaptable manner. Extensive experiments on the Tsinghua-Tencent 100K (TT100K) benchmark and the German Traffic Sign Detection Benchmark (GTSDB) demonstrate that our EMDFNet outperforms other state-of-the-art detectors in performance while retaining the real-time processing capabilities of single-stage models. This substantiates the effectiveness of EMDFNet in detecting small traffic signs.
