STAGNet: A Spatio-Temporal Graph and LSTM Framework for Accident Anticipation
Vipooshan Vipulananthan, Kumudu Mohottala, Kavindu Chinthana, Nimsara Paramulla, Charith D Chitraranjan
TL;DR
STAGNet tackles accident anticipation from dash-cam video by fusing spatio-temporal object graphs with an LSTM-based temporal model and SlowFast-derived frame features. The architecture comprises three modules: Spatio-Temporal Object Graph Learning, Spatio-Temporal Frame-level Feature Learning, and Frame Graph Learning, plus a lightweight STAGNet-Lite variant. Across DAD, DoTA, and DADA datasets, STAGNet delivers state-of-the-art average precision and mean time-to-accident, including strong cross-dataset generalization, while STAGNet-Lite offers significant speedups for ego-involved scenarios. The results highlight the importance of explicit temporal modeling, powerful SlowFast features, and attention-based graph processing for robust, real-time accident anticipation.
Abstract
Accident prediction and timely warnings play a key role in improving road safety by reducing the risk of injury to road users and minimizing property damage. Advanced Driver Assistance Systems (ADAS) are designed to support human drivers and are especially useful when they can anticipate potential accidents before they happen. While many existing systems depend on a range of sensors such as LiDAR, radar, and GPS, relying solely on dash-cam video input presents a more challenging but a more cost-effective and easily deployable solution. In this work, we incorporate better spatio-temporal features and aggregate them through a recurrent network to improve upon state-of-the-art graph neural networks for predicting accidents from dash-cam videos. Experiments using three publicly available datasets show that our proposed STAGNet model achieves higher average precision and mean time-to-collision values than previous methods, both when cross-validated on a given dataset and when trained and tested on different datasets.
