ARES: Anomaly Recognition Model For Edge Streams
Simone Mungari, Albert Bifet, Giuseppe Manco, Bernhard Pfahringer
TL;DR
The paper tackles real-time anomaly detection in edge streams by modeling temporal edge interactions as graphs and proposing ARES, an unsupervised framework that combines Graph Neural Network embeddings with Half-Space Trees for fast anomaly scoring.ARES uses a Graph Autoencoder with a GraphSAGE encoder to generate node and edge embeddings, and employs dual Half-Space Tree ensembles to produce per-edge anomaly scores, enabling detection of both spike and burst anomalies in streaming graphs.To translate anomaly scores into actionable decisions, the method includes a simple, validation-based adaptive thresholding mechanism using the Gini index, allowing domain-robust discrimination with minimal supervision.Extensive experiments on seven real-world cybersecurity datasets show that ARES generally outperforms state-of-the-art methods in ROC-AUC and AP, with favorable scalability and robustness to concept drift, supported by ablation studies validating the effectiveness of the GNN+HST combination.
Abstract
Many real-world scenarios involving streaming information can be represented as temporal graphs, where data flows through dynamic changes in edges over time. Anomaly detection in this context has the objective of identifying unusual temporal connections within the graph structure. Detecting edge anomalies in real time is crucial for mitigating potential risks. Unlike traditional anomaly detection, this task is particularly challenging due to concept drifts, large data volumes, and the need for real-time response. To face these challenges, we introduce ARES, an unsupervised anomaly detection framework for edge streams. ARES combines Graph Neural Networks (GNNs) for feature extraction with Half-Space Trees (HST) for anomaly scoring. GNNs capture both spike and burst anomalous behaviors within streams by embedding node and edge properties in a latent space, while HST partitions this space to isolate anomalies efficiently. ARES operates in an unsupervised way without the need for prior data labeling. To further validate its detection capabilities, we additionally incorporate a simple yet effective supervised thresholding mechanism. This approach leverages statistical dispersion among anomaly scores to determine the optimal threshold using a minimal set of labeled data, ensuring adaptability across different domains. We validate ARES through extensive evaluations across several real-world cyber-attack scenarios, comparing its performance against existing methods while analyzing its space and time complexity.
