Table of Contents
Fetching ...

STAGNet: A Spatio-Temporal Graph and LSTM Framework for Accident Anticipation

Vipooshan Vipulananthan, Kumudu Mohottala, Kavindu Chinthana, Nimsara Paramulla, Charith D Chitraranjan

TL;DR

STAGNet tackles accident anticipation from dash-cam video by fusing spatio-temporal object graphs with an LSTM-based temporal model and SlowFast-derived frame features. The architecture comprises three modules: Spatio-Temporal Object Graph Learning, Spatio-Temporal Frame-level Feature Learning, and Frame Graph Learning, plus a lightweight STAGNet-Lite variant. Across DAD, DoTA, and DADA datasets, STAGNet delivers state-of-the-art average precision and mean time-to-accident, including strong cross-dataset generalization, while STAGNet-Lite offers significant speedups for ego-involved scenarios. The results highlight the importance of explicit temporal modeling, powerful SlowFast features, and attention-based graph processing for robust, real-time accident anticipation.

Abstract

Accident prediction and timely warnings play a key role in improving road safety by reducing the risk of injury to road users and minimizing property damage. Advanced Driver Assistance Systems (ADAS) are designed to support human drivers and are especially useful when they can anticipate potential accidents before they happen. While many existing systems depend on a range of sensors such as LiDAR, radar, and GPS, relying solely on dash-cam video input presents a more challenging but a more cost-effective and easily deployable solution. In this work, we incorporate better spatio-temporal features and aggregate them through a recurrent network to improve upon state-of-the-art graph neural networks for predicting accidents from dash-cam videos. Experiments using three publicly available datasets show that our proposed STAGNet model achieves higher average precision and mean time-to-collision values than previous methods, both when cross-validated on a given dataset and when trained and tested on different datasets.

STAGNet: A Spatio-Temporal Graph and LSTM Framework for Accident Anticipation

TL;DR

STAGNet tackles accident anticipation from dash-cam video by fusing spatio-temporal object graphs with an LSTM-based temporal model and SlowFast-derived frame features. The architecture comprises three modules: Spatio-Temporal Object Graph Learning, Spatio-Temporal Frame-level Feature Learning, and Frame Graph Learning, plus a lightweight STAGNet-Lite variant. Across DAD, DoTA, and DADA datasets, STAGNet delivers state-of-the-art average precision and mean time-to-accident, including strong cross-dataset generalization, while STAGNet-Lite offers significant speedups for ego-involved scenarios. The results highlight the importance of explicit temporal modeling, powerful SlowFast features, and attention-based graph processing for robust, real-time accident anticipation.

Abstract

Accident prediction and timely warnings play a key role in improving road safety by reducing the risk of injury to road users and minimizing property damage. Advanced Driver Assistance Systems (ADAS) are designed to support human drivers and are especially useful when they can anticipate potential accidents before they happen. While many existing systems depend on a range of sensors such as LiDAR, radar, and GPS, relying solely on dash-cam video input presents a more challenging but a more cost-effective and easily deployable solution. In this work, we incorporate better spatio-temporal features and aggregate them through a recurrent network to improve upon state-of-the-art graph neural networks for predicting accidents from dash-cam videos. Experiments using three publicly available datasets show that our proposed STAGNet model achieves higher average precision and mean time-to-collision values than previous methods, both when cross-validated on a given dataset and when trained and tested on different datasets.

Paper Structure

This paper contains 19 sections, 7 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Architecture of the proposed STAGNet accident anticipation framework. Boxes with thicker borders represent the main components introduced in this work. The components within the red dashed rectangle are pruned in the light-weight model, STAGNet-Lite.
  • Figure 2: Configuration of the proposed STAGNet accident anticipation framework.
  • Figure 3: Box plots showing the bit-rates of DoTA and DADA videos.
  • Figure 4: Sample frames from DADA (top row) and DoTA (bottom row). The visual quality of DADA videos is lower than that of DoTA videos.
  • Figure 5: (a) Distribution of crash-object categories and (b) category-wise cross-dataset recall for DoTA and DADA datasets. For each test dataset, the recall values are reported at the probability threshold that yields an overall recall of 80%.