Understanding Real-World Traffic Safety through RoadSafe365 Benchmark
Xinyu Liu, Darryl C. Jacob, Yuxin Liu, Xinsong Du, Muchao Ye, Bolei Zhou, Pan He
TL;DR
RoadSafe365 tackles the gap between real-world traffic safety understanding and data-driven vision-language models by introducing a large-scale, NSC-aligned benchmark. It builds a hierarchical taxonomy, curates 36,196 real-world clips from dashcam and surveillance sources, and provides rich attributes, VQA sets, and dense captions to enable fine-grained safety reasoning. Experimental results show that domain-specific fine-tuning on RoadSafe365 improves VQA accuracy and caption quality, with notable gains in safety-critical categories and strong cross-domain transfer to synthetic datasets. This benchmark offers a standardized, scalable platform for training and evaluating interpretable, safety-aware multimodal models in transportation contexts.
Abstract
Although recent traffic benchmarks have advanced multimodal data analysis, they generally lack systematic evaluation aligned with official safety standards. To fill this gap, we introduce RoadSafe365, a large-scale vision-language benchmark that supports fine-grained analysis of traffic safety from extensive and diverse real-world video data collections. Unlike prior works that focus primarily on coarse accident identification, RoadSafe365 is independently curated and systematically organized using a hierarchical taxonomy that refines and extends foundational definitions of crash, incident, and violation to bridge official traffic safety standards with data-driven traffic understanding systems. RoadSafe365 provides rich attribute annotations across diverse traffic event types, environmental contexts, and interaction scenarios, yielding 36,196 annotated clips from both dashcam and surveillance cameras. Each clip is paired with multiple-choice question-answer sets, comprising 864K candidate options, 8.4K unique answers, and 36K detailed scene descriptions collectively designed for vision-language understanding and reasoning. We establish strong baselines and observe consistent gains when fine-tuning on RoadSafe365. Cross-domain experiments on both real and synthetic datasets further validate its effectiveness. Designed for large-scale training and standardized evaluation, RoadSafe365 provides a comprehensive benchmark to advance reproducible research in real-world traffic safety analysis.
