Table of Contents
Fetching ...

ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity

Mahmuda Sultana Mimi, Md Monzurul Islam, Anannya Ghosh Tusti, Shriyank Somvanshi, Subasish Das

TL;DR

ST-GraphNet introduces a spatio-temporal graph neural network that leverages coarse-grained H3 hexagon aggregation to predict automated vehicle crash severity from multimodal data. Built on a DSTGCN backbone, it fuses region-level aggregates (SAE histograms, injury counts, temporal histograms) with narrative embeddings to model diffusion across the hex grid and temporal evolution. On Texas AV crash data, the approach achieves a test F1 of 0.9774 and an AUC of 0.998, significantly outperforming fine-grained baselines and illustrating the value of region-level representations for safety analysis. The work highlights practical implications for AV safety planning and suggests future work on multi-resolution graphs, real-time updates, and interpretability to broaden applicability.

Abstract

Understanding the spatial and temporal dynamics of automated vehicle (AV) crash severity is critical for advancing urban mobility safety and infrastructure planning. In this work, we introduce ST-GraphNet, a spatio-temporal graph neural network framework designed to model and predict AV crash severity by using both fine-grained and region-aggregated spatial graphs. Using a balanced dataset of 2,352 real-world AV-related crash reports from Texas (2024), including geospatial coordinates, crash timestamps, SAE automation levels, and narrative descriptions, we construct two complementary graph representations: (1) a fine-grained graph with individual crash events as nodes, where edges are defined via spatio-temporal proximity; and (2) a coarse-grained graph where crashes are aggregated into Hexagonal Hierarchical Spatial Indexing (H3)-based spatial cells, connected through hexagonal adjacency. Each node in the graph is enriched with multimodal data, including semantic, spatial, and temporal attributes, including textual embeddings from crash narratives using a pretrained Sentence-BERT model. We evaluate various graph neural network (GNN) architectures, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and Dynamic Spatio-Temporal GCN (DSTGCN), to classify crash severity and predict high-risk regions. Our proposed ST-GraphNet, which utilizes a DSTGCN backbone on the coarse-grained H3 graph, achieves a test accuracy of 97.74\%, substantially outperforming the best fine-grained model (64.7\% test accuracy). These findings highlight the effectiveness of spatial aggregation, dynamic message passing, and multi-modal feature integration in capturing the complex spatio-temporal patterns underlying AV crash severity.

ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity

TL;DR

ST-GraphNet introduces a spatio-temporal graph neural network that leverages coarse-grained H3 hexagon aggregation to predict automated vehicle crash severity from multimodal data. Built on a DSTGCN backbone, it fuses region-level aggregates (SAE histograms, injury counts, temporal histograms) with narrative embeddings to model diffusion across the hex grid and temporal evolution. On Texas AV crash data, the approach achieves a test F1 of 0.9774 and an AUC of 0.998, significantly outperforming fine-grained baselines and illustrating the value of region-level representations for safety analysis. The work highlights practical implications for AV safety planning and suggests future work on multi-resolution graphs, real-time updates, and interpretability to broaden applicability.

Abstract

Understanding the spatial and temporal dynamics of automated vehicle (AV) crash severity is critical for advancing urban mobility safety and infrastructure planning. In this work, we introduce ST-GraphNet, a spatio-temporal graph neural network framework designed to model and predict AV crash severity by using both fine-grained and region-aggregated spatial graphs. Using a balanced dataset of 2,352 real-world AV-related crash reports from Texas (2024), including geospatial coordinates, crash timestamps, SAE automation levels, and narrative descriptions, we construct two complementary graph representations: (1) a fine-grained graph with individual crash events as nodes, where edges are defined via spatio-temporal proximity; and (2) a coarse-grained graph where crashes are aggregated into Hexagonal Hierarchical Spatial Indexing (H3)-based spatial cells, connected through hexagonal adjacency. Each node in the graph is enriched with multimodal data, including semantic, spatial, and temporal attributes, including textual embeddings from crash narratives using a pretrained Sentence-BERT model. We evaluate various graph neural network (GNN) architectures, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and Dynamic Spatio-Temporal GCN (DSTGCN), to classify crash severity and predict high-risk regions. Our proposed ST-GraphNet, which utilizes a DSTGCN backbone on the coarse-grained H3 graph, achieves a test accuracy of 97.74\%, substantially outperforming the best fine-grained model (64.7\% test accuracy). These findings highlight the effectiveness of spatial aggregation, dynamic message passing, and multi-modal feature integration in capturing the complex spatio-temporal patterns underlying AV crash severity.

Paper Structure

This paper contains 25 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Fine Grain Graph Architecture
  • Figure 2: Coarse Grain Graph Architecture
  • Figure 3: Proposed Study Design
  • Figure 4: Left: Training and validation accuracy over 30 epochs. Right: Training and validation F1 score over 30 epochs. Notice how both metrics improve rapidly and the small gap between train/val indicates overfitting is controlled.
  • Figure 5: Overview of the proposed ST-GraphNet framework. On the left, raw and contextual inputs are combined in a multimodal data processing block to produce two parallel streams. On the right, the temporal convolution stream (top) applies successive 1 √ó N and N √ó 1 convolutions to extract evolving spatial patterns from traffic matrices, while the graph prediction and propagation stream (bottom) dynamically learns time-varying adjacency (affinity) matrices and feeds them into stacked Spatio-Temporal Graph Convolution (STC) layers.
  • ...and 1 more figures