Table of Contents
Fetching ...

Sequence-Preserving Dual-FoV Defense for Traffic Sign and Light Recognition in Autonomous Vehicles

Abhishek Joshi, Jahnavi Krishna Koda, Abhishek Phadke

TL;DR

This work addresses traffic light and sign recognition in autonomous vehicles under both digital and naturally occurring perturbations. It proposes a sequence-preserving dual-FoV framework and a unified three-layer defense stack, augmented by sequence-level temporal voting, tested on a new multi-source, dual-FoV dataset spanning four operational design domains. The approach yields a notable improvement in robustness, achieving $mAP$ up to 79.8 with a reduced attack success rate of 18.2%, and demonstrates the value of cross-FoV validation and temporal consistency. Despite these gains, the study acknowledges limitations in geographic coverage, compound perturbations, and deployment latency, outlining future directions toward broader datasets, enhanced robustness, and closer integration with downstream planning.

Abstract

Traffic light and sign recognition are key for Autonomous Vehicles (AVs) because perception mistakes directly influence navigation and safety. In addition to digital adversarial attacks, models are vulnerable to existing perturbations (glare, rain, dirt, or graffiti), which could lead to dangerous misclassifications. The current work lacks consideration of temporal continuity, multistatic field-of-view (FoV) sensing, and robustness to both digital and natural degradation. This study proposes a dual FoV, sequence-preserving robustness framework for traffic lights and signs in the USA based on a multi-source dataset built on aiMotive, Udacity, Waymo, and self-recorded videos from the region of Texas. Mid and long-term sequences of RGB images are temporally aligned for four operational design domains (ODDs): highway, night, rainy, and urban. Over a series of experiments on a real-life application of anomaly detection, this study outlines a unified three-layer defense stack framework that incorporates feature squeezing, defensive distillation, and entropy-based anomaly detection, as well as sequence-wise temporal voting for further enhancement. The evaluation measures included accuracy, attack success rate (ASR), risk-weighted misclassification severity, and confidence stability. Physical transferability was confirmed using probes for recapture. The results showed that the Unified Defense Stack achieved 79.8mAP and reduced the ASR to 18.2%, which is superior to YOLOv8, YOLOv9, and BEVFormer, while reducing the high-risk misclassification to 32%.

Sequence-Preserving Dual-FoV Defense for Traffic Sign and Light Recognition in Autonomous Vehicles

TL;DR

This work addresses traffic light and sign recognition in autonomous vehicles under both digital and naturally occurring perturbations. It proposes a sequence-preserving dual-FoV framework and a unified three-layer defense stack, augmented by sequence-level temporal voting, tested on a new multi-source, dual-FoV dataset spanning four operational design domains. The approach yields a notable improvement in robustness, achieving up to 79.8 with a reduced attack success rate of 18.2%, and demonstrates the value of cross-FoV validation and temporal consistency. Despite these gains, the study acknowledges limitations in geographic coverage, compound perturbations, and deployment latency, outlining future directions toward broader datasets, enhanced robustness, and closer integration with downstream planning.

Abstract

Traffic light and sign recognition are key for Autonomous Vehicles (AVs) because perception mistakes directly influence navigation and safety. In addition to digital adversarial attacks, models are vulnerable to existing perturbations (glare, rain, dirt, or graffiti), which could lead to dangerous misclassifications. The current work lacks consideration of temporal continuity, multistatic field-of-view (FoV) sensing, and robustness to both digital and natural degradation. This study proposes a dual FoV, sequence-preserving robustness framework for traffic lights and signs in the USA based on a multi-source dataset built on aiMotive, Udacity, Waymo, and self-recorded videos from the region of Texas. Mid and long-term sequences of RGB images are temporally aligned for four operational design domains (ODDs): highway, night, rainy, and urban. Over a series of experiments on a real-life application of anomaly detection, this study outlines a unified three-layer defense stack framework that incorporates feature squeezing, defensive distillation, and entropy-based anomaly detection, as well as sequence-wise temporal voting for further enhancement. The evaluation measures included accuracy, attack success rate (ASR), risk-weighted misclassification severity, and confidence stability. Physical transferability was confirmed using probes for recapture. The results showed that the Unified Defense Stack achieved 79.8mAP and reduced the ASR to 18.2%, which is superior to YOLOv8, YOLOv9, and BEVFormer, while reducing the high-risk misclassification to 32%.

Paper Structure

This paper contains 64 sections, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustrative frames from multi-source datasets across four ODDs showing dual-FoV capture (Mid-range and Long-range cameras).
  • Figure 2: Taxonomy of perturbations and defenses in AV perception. This framework explicitly addresses digital and physical perturbations, temporal dynamics, and hybrid attacks.
  • Figure 3: End-to-end pipeline: multi-source curation (Dual-FoV, sequence-preserving), perturbation suite (natural + digital), baselines, unified defense stack, and ODD-aware evaluation.
  • Figure 4: Performance degradation under individual and compound perturbations. Error bars indicate 95% CI.
  • Figure 5: Recovery curves comparing frame-by-frame inference vs. temporal voting. Shaded regions indicate 95% CI.