Understanding Real-World Traffic Safety through RoadSafe365 Benchmark

Xinyu Liu; Darryl C. Jacob; Yuxin Liu; Xinsong Du; Muchao Ye; Bolei Zhou; Pan He

Understanding Real-World Traffic Safety through RoadSafe365 Benchmark

Xinyu Liu, Darryl C. Jacob, Yuxin Liu, Xinsong Du, Muchao Ye, Bolei Zhou, Pan He

TL;DR

RoadSafe365 tackles the gap between real-world traffic safety understanding and data-driven vision-language models by introducing a large-scale, NSC-aligned benchmark. It builds a hierarchical taxonomy, curates 36,196 real-world clips from dashcam and surveillance sources, and provides rich attributes, VQA sets, and dense captions to enable fine-grained safety reasoning. Experimental results show that domain-specific fine-tuning on RoadSafe365 improves VQA accuracy and caption quality, with notable gains in safety-critical categories and strong cross-domain transfer to synthetic datasets. This benchmark offers a standardized, scalable platform for training and evaluating interpretable, safety-aware multimodal models in transportation contexts.

Abstract

Although recent traffic benchmarks have advanced multimodal data analysis, they generally lack systematic evaluation aligned with official safety standards. To fill this gap, we introduce RoadSafe365, a large-scale vision-language benchmark that supports fine-grained analysis of traffic safety from extensive and diverse real-world video data collections. Unlike prior works that focus primarily on coarse accident identification, RoadSafe365 is independently curated and systematically organized using a hierarchical taxonomy that refines and extends foundational definitions of crash, incident, and violation to bridge official traffic safety standards with data-driven traffic understanding systems. RoadSafe365 provides rich attribute annotations across diverse traffic event types, environmental contexts, and interaction scenarios, yielding 36,196 annotated clips from both dashcam and surveillance cameras. Each clip is paired with multiple-choice question-answer sets, comprising 864K candidate options, 8.4K unique answers, and 36K detailed scene descriptions collectively designed for vision-language understanding and reasoning. We establish strong baselines and observe consistent gains when fine-tuning on RoadSafe365. Cross-domain experiments on both real and synthetic datasets further validate its effectiveness. Designed for large-scale training and standardized evaluation, RoadSafe365 provides a comprehensive benchmark to advance reproducible research in real-world traffic safety analysis.

Understanding Real-World Traffic Safety through RoadSafe365 Benchmark

TL;DR

Abstract

Paper Structure (28 sections, 18 figures, 14 tables)

This paper contains 28 sections, 18 figures, 14 tables.

Introduction
Related Work
RoadSafe365
Video Data Collection and Preprocessing
Taxonomy for Traffic Safety Understanding
Annotation Pipeline
Summary of RoadSafe365 Statistics
Experiments
Experimental Setup
Tasks and Evaluation Protocol
Main Results on Tasks
Cross-domain Generalization
Conclusion
Appendix
Annotation Pipeline Details
...and 13 more sections

Figures (18)

Figure 1: Overview of the taxonomy for traffic safety understanding defined in the RoadSafe365 benchmark.
Figure 2: Overview of RoadSafe365: Collection, Annotation, Training, and Evaluation
Figure 3: RoadSafe365 Annotation Taxonomy and Data Distribution. (a) shows the distribution (log scale) of the five Level-1 categories in our taxonomy. (b) illustrates the Level-2 subcategories nested within each Level-1 category.
Figure 4: Comparison of accident captions generated by Qwen2.5-VL-7B before and after fine-tuning on RoadSafe365.
Figure 5: VQA performance of Qwen2.5-VL-7B fine-tuned on RoadSafe365 across different training iterations. NF denotes the original model without reinforcement fine-tuning.
...and 13 more figures

Understanding Real-World Traffic Safety through RoadSafe365 Benchmark

TL;DR

Abstract

Understanding Real-World Traffic Safety through RoadSafe365 Benchmark

Authors

TL;DR

Abstract

Table of Contents

Figures (18)