Table of Contents
Fetching ...

The Context of Crash Occurrence: A Complexity-Infused Approach Integrating Semantic, Contextual, and Kinematic Features

Meng Wang, Zach Noonan, Pnina Gershon, Bruce Mehler, Bryan Reimer, Shannon C. Roberts

TL;DR

This work addresses predicting crash density in complex driving environments by integrating semantic scene information, contextual road attributes, and vehicle kinematics into a two-stage framework. A complexity-infused encoder extracts hidden contextual representations from multimodal features, which are then combined with original features to predict crash density, achieving $90.15\%$ accuracy on all-feature inputs compared with $87.98\%$ using original features alone. The study demonstrates that AI-generated complexity indices (via LLMs) outperform human annotations in predictive power when integrated with semantic, kinematic, and contextual data, and provides SHAP-based insights into factors driving low, medium, and high crash-density regions. These findings support real-time crash risk estimation, inform driver-assistance and roadway design, and highlight the value of AI-assisted annotation for scalable safety analytics.

Abstract

Understanding the context of crash occurrence in complex driving environments is essential for improving traffic safety and advancing automated driving. Previous studies have used statistical models and deep learning to predict crashes based on semantic, contextual, or vehicle kinematic features, but none have examined the combined influence of these factors. In this study, we term the integration of these features ``roadway complexity''. This paper introduces a two-stage framework that integrates roadway complexity features for crash prediction. In the first stage, an encoder extracts hidden contextual information from these features, generating complexity-infused features. The second stage uses both original and complexity-infused features to predict crash likelihood, achieving an accuracy of 87.98\% with original features alone and 90.15\% with the added complexity-infused features. Ablation studies confirm that a combination of semantic, kinematic, and contextual features yields the best results, which emphasize their role in capturing roadway complexity. Additionally, complexity index annotations generated by the Large Language Model outperform those by Amazon Mechanical Turk, highlighting the potential of AI-based tools for accurate, scalable crash prediction systems.

The Context of Crash Occurrence: A Complexity-Infused Approach Integrating Semantic, Contextual, and Kinematic Features

TL;DR

This work addresses predicting crash density in complex driving environments by integrating semantic scene information, contextual road attributes, and vehicle kinematics into a two-stage framework. A complexity-infused encoder extracts hidden contextual representations from multimodal features, which are then combined with original features to predict crash density, achieving accuracy on all-feature inputs compared with using original features alone. The study demonstrates that AI-generated complexity indices (via LLMs) outperform human annotations in predictive power when integrated with semantic, kinematic, and contextual data, and provides SHAP-based insights into factors driving low, medium, and high crash-density regions. These findings support real-time crash risk estimation, inform driver-assistance and roadway design, and highlight the value of AI-assisted annotation for scalable safety analytics.

Abstract

Understanding the context of crash occurrence in complex driving environments is essential for improving traffic safety and advancing automated driving. Previous studies have used statistical models and deep learning to predict crashes based on semantic, contextual, or vehicle kinematic features, but none have examined the combined influence of these factors. In this study, we term the integration of these features ``roadway complexity''. This paper introduces a two-stage framework that integrates roadway complexity features for crash prediction. In the first stage, an encoder extracts hidden contextual information from these features, generating complexity-infused features. The second stage uses both original and complexity-infused features to predict crash likelihood, achieving an accuracy of 87.98\% with original features alone and 90.15\% with the added complexity-infused features. Ablation studies confirm that a combination of semantic, kinematic, and contextual features yields the best results, which emphasize their role in capturing roadway complexity. Additionally, complexity index annotations generated by the Large Language Model outperform those by Amazon Mechanical Turk, highlighting the potential of AI-based tools for accurate, scalable crash prediction systems.

Paper Structure

This paper contains 28 sections, 3 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The model structure. The model takes raw images and CAN-Bus signals as inputs to generate semantic, contextual, and kinematic features, which are then used to investigate their relationship with crash density estimates, serving as the model’s output. It consists of an encoder that learns hidden features from the semantic, kinematic, and contextual data, which are infused with the complexity index. The prediction model then utilizes all the available features, including the complexity-infused features, to predict the crash density and rates. Example data is shown above each feature source.
  • Figure 2: The raw roadway scene image and OneFormer algorithm output. The lead-car region is highlighted in a green box.
  • Figure 3: The prompt used in collecting contextual features with GPT-4o model.
  • Figure 4: Crash density heatmap (2018-2022) in Massachusetts, displayed in red, where darker colors indicate a higher crash density. Five hundred video clips from the MIT-AVT dataset are marked in blue.
  • Figure 5: The distribution of crash density value.
  • ...and 2 more figures