Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen
Zihao Li, Xinyuan Cao, Xiangbo Gao, Kexin Tian, Keshu Wu, Mohammad Anis, Hao Zhang, Keke Long, Jiwan Jiang, Xiaopeng Li, Yunlong Zhang, Tianbao Yang, Dominique Lord, Zhengzhong Tu, Yang Zhou
TL;DR
This paper tackles the data paradox in traffic crash prediction where the most harmful events are extremely rare. It argues that effective crash AI must learn from what did not happen by augmenting data with near‑miss scenarios and by linking macro crash rates to micro, scenario‑level reasoning through AI‑driven digital twins. The authors propose a four‑pillar AI framework—AI‑driven digital twin, generative scenario engines, multi‑scale validation with causal reasoning, and an intervention brain using RL and VLMs—to synthesize diverse, high‑fidelity crash and near‑crash events and to test safety interventions before deployment. By enabling counterfactual analysis, explainable reasoning, and cross‑scale feedback, this approach aims to move traffic safety from reactive forensics to proactive prevention toward Vision Zero.
Abstract
Traffic safety science has long been hindered by a fundamental data paradox: the crashes we most wish to prevent are precisely those events we rarely observe. Existing crash-frequency models and surrogate safety metrics rely heavily on sparse, noisy, and under-reported records, while even sophisticated, high-fidelity simulations undersample the long-tailed situations that trigger catastrophic outcomes such as fatalities. We argue that the path to achieving Vision Zero, i.e., the complete elimination of traffic fatalities and severe injuries, requires a paradigm shift from traditional crash-only learning to a new form of counterfactual safety learning: reasoning not only about what happened, but also about the vast set of plausible yet perilous scenarios that could have happened under slightly different circumstances. To operationalize this shift, our proposed agenda bridges macro to micro. Guided by crash-rate priors, generative scene engines, diverse driver models, and causal learning, near-miss events are synthesized and explained. A crash-focused digital twin testbed links micro scenes to macro patterns, while a multi-objective validator ensures that simulations maintain statistical realism. This pipeline transforms sparse crash data into rich signals for crash prediction, enabling the stress-testing of vehicles, roads, and policies before deployment. By learning from crashes that almost happened, we can shift traffic safety from reactive forensics to proactive prevention, advancing Vision Zero.
