Table of Contents
Fetching ...

FUSED-Net: Detecting Traffic Signs with Limited Data

Md. Atiqur Rahman, Nahian Ibn Asad, Md. Mushfiqul Haque Omi, Md. Bakhtiar Hasan, Sabbir Ahmed, Md. Hasanul Kabir

TL;DR

FUSED-Net targets traffic sign detection under limited labeled data by unfreezing all network parameters and integrating four techniques: pseudo-support sets (PSS) for data augmentation, embedding normalization (EN) implemented via a cosine-similarity classifier, and domain adaptation (DA) through multi-country pretraining. Built on Faster R-CNN with a ResNet-101 + FPN backbone, the model leverages a cosine-based embedding space to reduce intra-class variance and improve generalization across domains. Empirical results on MTSD and BDTS show substantial gains over state-of-the-art FSOD methods, including a 42.81% average mAP across 1/3/5/10-shot regimes on BDTS, with strong cross-domain performance on the CD-FSOD benchmark. The work demonstrates that simple, well-integrated components—without freezing parameters—can significantly enhance few-shot traffic sign detection, offering practical benefits for real-world, data-scarce settings.

Abstract

Automatic Traffic Sign Recognition is paramount in modern transportation systems, motivating several research endeavors to focus on performance improvement by utilizing large-scale datasets. As the appearance of traffic signs varies across countries, curating large-scale datasets is often impractical; and requires efficient models that can produce satisfactory performance using limited data. In this connection, we present 'FUSED-Net', built-upon Faster RCNN for traffic sign detection, enhanced by Unfrozen Parameters, Pseudo-Support Sets, Embedding Normalization, and Domain Adaptation while reducing data requirement. Unlike traditional approaches, we keep all parameters unfrozen during training, enabling FUSED-Net to learn from limited samples. The generation of a Pseudo-Support Set through data augmentation further enhances performance by compensating for the scarcity of target domain data. Additionally, Embedding Normalization is incorporated to reduce intra-class variance, standardizing feature representation. Domain Adaptation, achieved by pre-training on a diverse traffic sign dataset distinct from the target domain, improves model generalization. Evaluating FUSED-Net on the BDTSD dataset, we achieved 2.4x, 2.2x, 1.5x, and 1.3x improvements of mAP in 1-shot, 3-shot, 5-shot, and 10-shot scenarios, respectively compared to the state-of-the-art Few-Shot Object Detection (FSOD) models. Additionally, we outperform state-of-the-art works on the cross-domain FSOD benchmark under several scenarios.

FUSED-Net: Detecting Traffic Signs with Limited Data

TL;DR

FUSED-Net targets traffic sign detection under limited labeled data by unfreezing all network parameters and integrating four techniques: pseudo-support sets (PSS) for data augmentation, embedding normalization (EN) implemented via a cosine-similarity classifier, and domain adaptation (DA) through multi-country pretraining. Built on Faster R-CNN with a ResNet-101 + FPN backbone, the model leverages a cosine-based embedding space to reduce intra-class variance and improve generalization across domains. Empirical results on MTSD and BDTS show substantial gains over state-of-the-art FSOD methods, including a 42.81% average mAP across 1/3/5/10-shot regimes on BDTS, with strong cross-domain performance on the CD-FSOD benchmark. The work demonstrates that simple, well-integrated components—without freezing parameters—can significantly enhance few-shot traffic sign detection, offering practical benefits for real-world, data-scarce settings.

Abstract

Automatic Traffic Sign Recognition is paramount in modern transportation systems, motivating several research endeavors to focus on performance improvement by utilizing large-scale datasets. As the appearance of traffic signs varies across countries, curating large-scale datasets is often impractical; and requires efficient models that can produce satisfactory performance using limited data. In this connection, we present 'FUSED-Net', built-upon Faster RCNN for traffic sign detection, enhanced by Unfrozen Parameters, Pseudo-Support Sets, Embedding Normalization, and Domain Adaptation while reducing data requirement. Unlike traditional approaches, we keep all parameters unfrozen during training, enabling FUSED-Net to learn from limited samples. The generation of a Pseudo-Support Set through data augmentation further enhances performance by compensating for the scarcity of target domain data. Additionally, Embedding Normalization is incorporated to reduce intra-class variance, standardizing feature representation. Domain Adaptation, achieved by pre-training on a diverse traffic sign dataset distinct from the target domain, improves model generalization. Evaluating FUSED-Net on the BDTSD dataset, we achieved 2.4x, 2.2x, 1.5x, and 1.3x improvements of mAP in 1-shot, 3-shot, 5-shot, and 10-shot scenarios, respectively compared to the state-of-the-art Few-Shot Object Detection (FSOD) models. Additionally, we outperform state-of-the-art works on the cross-domain FSOD benchmark under several scenarios.
Paper Structure (33 sections, 4 equations, 8 figures, 5 tables)

This paper contains 33 sections, 4 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Comparison of Mean Average Precision (mAP) between frozen and unfrozen training conditions across different state-of-the-art architectures on the Cross-domain FSOD benchmark. The results demonstrate that keeping parameters unfrozen during training consistently enhances performance, as indicated by the higher mAP scores in the unfrozen condition. (Data adapted from xiong2023cd)
  • Figure 2: Overview of the FUSED-Net Architecture. The architecture begins with a Faster R-CNN model pre-trained on the MSCOCO dataset, which is then modified by incorporating a cosine similarity-based classifier for embedding normalization. This modified Faster R-CNN undergoes domain adaptation using the Merged Traffic Sign Detection Dataset (MTSD), enhancing its ability to handle diverse traffic sign appearances. In the fine-tuning stage, the model learns from a pseudo-support set generated from the target Bangladeshi Traffic Sign Detection Dataset (BDTSD). Throughout the process, all modules of the model remain unfrozen.
  • Figure 3: Samples from the Merged Traffic Sign Detection Dataset (MTSD). They illustrate key differences between European (a, b) and U.S. (c) traffic signs. U.S. signs are rectangular with black text on a white background, while European signs are circular with red borders and black numbers. European signs often feature symbols and pictograms, making them more universally accessible, whereas U.S. signs tend to rely on text. Additionally, as evident from (b), European countries may also position traffic signs on the left side of the road and use multiple signs together.
  • Figure 4: The dataset statistics of MTSD. The sample and the annotation count per category is highly skewed considering more than 85% of the categories have less than 100 samples whereas the most frequent categories have more than 1000 samples.
  • Figure 5: Samples from the Bangladeshi Traffic Sign Detection Dataset (BDTSD). They are characterized by red-bordered signs with numbers and pictograms. The samples highlight various detection challenges. (a) shows a traffic sign partially occluded by trees. The traffic sign in (b) depicts multiple distant traffic signs, complicating detection. (c) presents a blurry traffic sign, while (d) captures a night scene where the sign blends into the background, posing difficulty for detection.
  • ...and 3 more figures