Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis
Abhinav Nippani, Dongyue Li, Haotian Ju, Haris N. Koutsopoulos, Hongyang R. Zhang
TL;DR
The paper addresses predicting road-traffic accidents on road networks by constructing a large-scale, unified dataset of 9 million accident records across eight US states and integrating road graphs, traffic volume, and weather. It evaluates graph neural networks, notably GraphSAGE, for edge-level accident prediction using multitask learning across states and transfer learning to incorporate annual traffic volume as an auxiliary task. The results show $MAE \\approx 0.3$ and $AUROC \\approx 0.87$ on average, with multitask learning and volume transfer providing additional gains, and reveal that road-network structure is highly informative for risk assessment. The work also provides a public ML4RoadSafety package to facilitate reuse and cross-state analyses, highlighting practical implications for policy and safety interventions.
Abstract
We consider the problem of traffic accident analysis on a road network based on road network connections and traffic volume. Previous works have designed various deep-learning methods using historical records to predict traffic accident occurrences. However, there is a lack of consensus on how accurate existing methods are, and a fundamental issue is the lack of public accident datasets for comprehensive evaluations. This paper constructs a large-scale, unified dataset of traffic accident records from official reports of various states in the US, totaling 9 million records, accompanied by road networks and traffic volume reports. Using this new dataset, we evaluate existing deep-learning methods for predicting the occurrence of accidents on road networks. Our main finding is that graph neural networks such as GraphSAGE can accurately predict the number of accidents on roads with less than 22% mean absolute error (relative to the actual count) and whether an accident will occur or not with over 87% AUROC, averaged over states. We achieve these results by using multitask learning to account for cross-state variabilities (e.g., availability of accident labels) and transfer learning to combine traffic volume with accident prediction. Ablation studies highlight the importance of road graph-structural features, amongst other features. Lastly, we discuss the implications of the analysis and develop a package for easily using our new dataset.
