ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity
Mahmuda Sultana Mimi, Md Monzurul Islam, Anannya Ghosh Tusti, Shriyank Somvanshi, Subasish Das
TL;DR
ST-GraphNet introduces a spatio-temporal graph neural network that leverages coarse-grained H3 hexagon aggregation to predict automated vehicle crash severity from multimodal data. Built on a DSTGCN backbone, it fuses region-level aggregates (SAE histograms, injury counts, temporal histograms) with narrative embeddings to model diffusion across the hex grid and temporal evolution. On Texas AV crash data, the approach achieves a test F1 of 0.9774 and an AUC of 0.998, significantly outperforming fine-grained baselines and illustrating the value of region-level representations for safety analysis. The work highlights practical implications for AV safety planning and suggests future work on multi-resolution graphs, real-time updates, and interpretability to broaden applicability.
Abstract
Understanding the spatial and temporal dynamics of automated vehicle (AV) crash severity is critical for advancing urban mobility safety and infrastructure planning. In this work, we introduce ST-GraphNet, a spatio-temporal graph neural network framework designed to model and predict AV crash severity by using both fine-grained and region-aggregated spatial graphs. Using a balanced dataset of 2,352 real-world AV-related crash reports from Texas (2024), including geospatial coordinates, crash timestamps, SAE automation levels, and narrative descriptions, we construct two complementary graph representations: (1) a fine-grained graph with individual crash events as nodes, where edges are defined via spatio-temporal proximity; and (2) a coarse-grained graph where crashes are aggregated into Hexagonal Hierarchical Spatial Indexing (H3)-based spatial cells, connected through hexagonal adjacency. Each node in the graph is enriched with multimodal data, including semantic, spatial, and temporal attributes, including textual embeddings from crash narratives using a pretrained Sentence-BERT model. We evaluate various graph neural network (GNN) architectures, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and Dynamic Spatio-Temporal GCN (DSTGCN), to classify crash severity and predict high-risk regions. Our proposed ST-GraphNet, which utilizes a DSTGCN backbone on the coarse-grained H3 graph, achieves a test accuracy of 97.74\%, substantially outperforming the best fine-grained model (64.7\% test accuracy). These findings highlight the effectiveness of spatial aggregation, dynamic message passing, and multi-modal feature integration in capturing the complex spatio-temporal patterns underlying AV crash severity.
