Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Joyjit Roy; Samaresh Kumar Singh

Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Joyjit Roy, Samaresh Kumar Singh

Abstract

Road crashes remain a leading cause of preventable fatalities. Existing prediction models predominantly produce binary outcomes, which offer limited actionable insights for real-time driver feedback. These approaches often lack continuous risk quantification, interpretability, and explicit consideration of vulnerable road users (VRUs), such as pedestrians and cyclists. This research introduces SafeDriver-IQ, a framework that transforms binary crash classifiers into continuous 0-100 safety scores by combining national crash statistics with naturalistic driving data from autonomous vehicles. The framework fuses National Highway Traffic Safety Administration (NHTSA) crash records with Waymo Open Motion Dataset scenarios, engineers domain-informed features, and incorporates a calibration layer grounded in transportation safety literature. Evaluation across 15 complementary analyses indicates that the framework reliably differentiates high-risk from low-risk driving conditions with strong discriminative performance. Findings further reveal that 87% of crashes involve multiple co-occurring risk factors, with non-linear compounding effects that increase the risk to 4.5x baseline. SafeDriver-IQ delivers proactive, explainable safety intelligence relevant to advanced driver-assistance systems (ADAS), fleet management, and urban infrastructure planning. This framework shifts the focus from reactive crash counting to real-time risk prevention.

Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Abstract

Paper Structure (42 sections, 3 equations, 13 figures, 14 tables)

This paper contains 42 sections, 3 equations, 13 figures, 14 tables.

Introduction
Related Work
Crash Prediction and Risk Modeling
Driver Safety Scoring
Vulnerable Road User Safety
Interpretable Machine Learning in Safety
Naturalistic Driving Datasets
Methodology
Data Sources
NHTSA Crash Report Sampling System (CRSS)
Waymo Open Motion Dataset (WOMD)
Safe Driving Sample Construction
Feature Engineering
Crash Factor Investigation
Model Training
...and 27 more sections

Figures (13)

Figure 1: SafeDriver-IQ full system architecture covering the data layer, feature engineering, crash factor investigation, ML pipeline, inverse modeling, real-time risk classification, and application deployment.
Figure 2: Precision-Recall curve for the crash class (AP = 0.891). The operating point at the default 0.5 threshold yields precision = 0.941 and recall = 0.480.
Figure 3: Confusion matrix for the RF model on the test set (n = 9,278) for the binary crash classification task.
Figure 4: Risk level confusion matrix across 864 driving scenarios. Expected labels are from domain-expert assessment. Overall accuracy is 87.0%, with all misclassifications between adjacent risk levels only.
Figure 5: Primary contributing factors from CRSS (2016-2023, 213,003 crashes). Rush hour (35.3%) and poor lighting (29.2%) are most frequent, whereas VRU involvement (8.7%) exhibits disproportionate severity.
...and 8 more figures

Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Abstract

Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Authors

Abstract

Table of Contents

Figures (13)