Table of Contents
Fetching ...

Traffic Sign Recognition in Autonomous Driving: Dataset, Benchmark, and Field Experiment

Guoyang Zhao, Weiqing Qi, Kai Zhang, Chenguang Zhang, Zeying Gong, Zhihai Bi, Kai Chen, Benshan Ma, Ming Liu, Jun Ma

Abstract

Traffic Sign Recognition (TSR) is a core perception capability for autonomous driving, where robustness to cross-region variation, long-tailed categories, and semantic ambiguity is essential for reliable real-world deployment. Despite steady progress in recognition accuracy, existing traffic sign datasets and benchmarks offer limited diagnostic insight into how different modeling paradigms behave under these practical challenges. We present TS-1M, a large-scale and globally diverse traffic sign dataset comprising over one million real-world images across 454 standardized categories, together with a diagnostic benchmark designed to analyze model capability boundaries. Beyond standard train-test evaluation, we provide a suite of challenge-oriented settings, including cross-region recognition, rare-class identification, low-clarity robustness, and semantic text understanding, enabling systematic and fine-grained assessment of modern TSR models. Using TS-1M, we conduct a unified benchmark across three representative learning paradigms: classical supervised models, self-supervised pretrained models, and multimodal vision-language models (VLMs). Our analysis reveals consistent paradigm-dependent behaviors, showing that semantic alignment is a key factor for cross-region generalization and rare-category recognition, while purely visual models remain sensitive to appearance shift and data imbalance. Finally, we validate the practical relevance of TS-1M through real-scene autonomous driving experiments, where traffic sign recognition is integrated with semantic reasoning and spatial localization to support map-level decision constraints. Overall, TS-1M establishes a reference-level diagnostic benchmark for TSR and provides principled insights into robust and semantic-aware traffic sign perception. Project page: https://guoyangzhao.github.io/projects/ts1m.

Traffic Sign Recognition in Autonomous Driving: Dataset, Benchmark, and Field Experiment

Abstract

Traffic Sign Recognition (TSR) is a core perception capability for autonomous driving, where robustness to cross-region variation, long-tailed categories, and semantic ambiguity is essential for reliable real-world deployment. Despite steady progress in recognition accuracy, existing traffic sign datasets and benchmarks offer limited diagnostic insight into how different modeling paradigms behave under these practical challenges. We present TS-1M, a large-scale and globally diverse traffic sign dataset comprising over one million real-world images across 454 standardized categories, together with a diagnostic benchmark designed to analyze model capability boundaries. Beyond standard train-test evaluation, we provide a suite of challenge-oriented settings, including cross-region recognition, rare-class identification, low-clarity robustness, and semantic text understanding, enabling systematic and fine-grained assessment of modern TSR models. Using TS-1M, we conduct a unified benchmark across three representative learning paradigms: classical supervised models, self-supervised pretrained models, and multimodal vision-language models (VLMs). Our analysis reveals consistent paradigm-dependent behaviors, showing that semantic alignment is a key factor for cross-region generalization and rare-category recognition, while purely visual models remain sensitive to appearance shift and data imbalance. Finally, we validate the practical relevance of TS-1M through real-scene autonomous driving experiments, where traffic sign recognition is integrated with semantic reasoning and spatial localization to support map-level decision constraints. Overall, TS-1M establishes a reference-level diagnostic benchmark for TSR and provides principled insights into robust and semantic-aware traffic sign perception. Project page: https://guoyangzhao.github.io/projects/ts1m.
Paper Structure (38 sections, 10 equations, 8 figures, 9 tables)

This paper contains 38 sections, 10 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: A diagnostic benchmark for traffic sign recognition. The dataset contains over one million images across 454 categories and supports comprehensive benchmarking of diverse model families under four key challenge suites. The benchmark is further validated through real-driving experiments that integrate recognition, semantic understanding, and spatial localization.
  • Figure 2: Overview of the TS-1M dataset construction process. The pipeline covers multi-source data collection, unified image preprocessing, semantic label normalization across heterogeneous datasets, and data augmentation to support large-scale, consistent, and robust benchmarking for TSR.
  • Figure 3: Statistical overview of the TS-1M dataset. The statistics summarize the spatial geographic distribution, category-wise sample distribution, and pixel-resolution distribution of images in TS-1M.
  • Figure 4: Examples of challenge-oriented sets in TS-1M. The figure illustrates representative samples from the cross-region, rare-class, low-clarity, and semantic description sets designed for systematic TSR evaluation.
  • Figure 5: Modeling frameworks of three representative learning paradigms in TS-1M. The figure compares classic supervised models, self-supervised pretrained models, and VLMs from the perspectives of training pipelines and inference mechanisms.
  • ...and 3 more figures