Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention

Kosti Koistinen; Kirsi Hellsten; Joni Herttuainen; Kimmo K. Kaski

Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention

Kosti Koistinen, Kirsi Hellsten, Joni Herttuainen, Kimmo K. Kaski

TL;DR

A Spatio-Temporal Attention Graph Neural Network (STA-GNN) is proposed for unsupervised and explainable anomaly detection in ICS that models both temporal dynamics and relational structure of the system and enables unified cyber-physical analysis.

Abstract

Industrial Control Systems (ICS) underpin critical infrastructure and face growing cyber-physical threats due to the convergence of operational technology and networked environments. While machine learning-based anomaly detection approaches in ICS shows strong theoretical performance, deployment is often limited by poor explainability, high false-positive rates, and sensitivity to evolving system behavior, i.e., baseline drifting. We propose a Spatio-Temporal Attention Graph Neural Network (STA-GNN) for unsupervised and explainable anomaly detection in ICS that models both temporal dynamics and relational structure of the system. Sensors, controllers, and network entities are represented as nodes in a dynamically learned graph, enabling the model to capture inter-dependencies across physical processes and communication patterns. Attention mechanisms provide influential relationships, supporting inspection of correlations and potential causal pathways behind detected events. The approach supports multiple data modalities, including SCADA point measurements, network flow features, and payload features, and thus enables unified cyber-physical analysis. To address operational requirements, we incorporate a conformal prediction strategy to control false alarm rates and monitor performance degradation under drifting of the environment. Our findings highlight the possibilities and limitations of model evaluation and common pitfalls in anomaly detection in ICS. Our findings emphasise the importance of explainable, drift-aware evaluation for reliable deployment of learning-based security monitoring systems.

Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention

TL;DR

Abstract

Paper Structure (24 sections, 26 equations, 7 figures, 6 tables)

This paper contains 24 sections, 26 equations, 7 figures, 6 tables.

Introduction
Related Work
Methodology
Architecture
Training Objective
Anomaly Scoring
Graph Explanations
Model Evaluation
Benchmark Data
Data Pre-Processing & Model Training
Physical-level Data.
Netflow Dataset.
NetFlow + Payload Dataset.
Training, Calibration, and Sampling.
Results
...and 9 more sections

Figures (7)

Figure 1: A schematic overview of the STA-GNN model architecture. The workflow illustrates the processing stages from input windows to the decoder producing predictions. The intermediate blocks employ a two-phase attention mechanism that generates two complementary graphs, enabling inspection of the model's decision making.
Figure 2: Example. Attack detected and contribution highest from red (highest) to yellow (lowest). The grey edges represent the learned embeddings + prior graph structure. The red edges come from the spatial attention. Only the strongest attention weights from/to anomalous nodes are plotted for interpretability. Red edge thickness reflects to strength of the attention. The graph nodes are organised and fixed by process stages in SWaT testbed dataset used in this study.
Figure 3: Comparison of normalised sensor response windows (shaded red) during the attack window (shaded blue and separated with blue dashed line). The attack on left was detected only once in the beginning of the attack. The attack on right was detected multiple times during attack, from various sensors and actuators (a cascade failure). For clarity, we only show top 3 anomalous sensors per detected anomaly.
Figure 4: FPR across datasets. Top: Model performance with (red) and without (blue) retraining. Bottom: Performance with recalibration of the 2015 model using the 2017 baseline. The FPR can be controlled with recalibration, which is often more feasible than retraining the model.
Figure 5: Attack on DPIT301 detected via anomalies in FIT601, with attention edges highlighting system-level dependencies between distant process stages.
...and 2 more figures

Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention

TL;DR

Abstract

Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (7)