Table of Contents
Fetching ...

ARES: Anomaly Recognition Model For Edge Streams

Simone Mungari, Albert Bifet, Giuseppe Manco, Bernhard Pfahringer

TL;DR

The paper tackles real-time anomaly detection in edge streams by modeling temporal edge interactions as graphs and proposing ARES, an unsupervised framework that combines Graph Neural Network embeddings with Half-Space Trees for fast anomaly scoring.ARES uses a Graph Autoencoder with a GraphSAGE encoder to generate node and edge embeddings, and employs dual Half-Space Tree ensembles to produce per-edge anomaly scores, enabling detection of both spike and burst anomalies in streaming graphs.To translate anomaly scores into actionable decisions, the method includes a simple, validation-based adaptive thresholding mechanism using the Gini index, allowing domain-robust discrimination with minimal supervision.Extensive experiments on seven real-world cybersecurity datasets show that ARES generally outperforms state-of-the-art methods in ROC-AUC and AP, with favorable scalability and robustness to concept drift, supported by ablation studies validating the effectiveness of the GNN+HST combination.

Abstract

Many real-world scenarios involving streaming information can be represented as temporal graphs, where data flows through dynamic changes in edges over time. Anomaly detection in this context has the objective of identifying unusual temporal connections within the graph structure. Detecting edge anomalies in real time is crucial for mitigating potential risks. Unlike traditional anomaly detection, this task is particularly challenging due to concept drifts, large data volumes, and the need for real-time response. To face these challenges, we introduce ARES, an unsupervised anomaly detection framework for edge streams. ARES combines Graph Neural Networks (GNNs) for feature extraction with Half-Space Trees (HST) for anomaly scoring. GNNs capture both spike and burst anomalous behaviors within streams by embedding node and edge properties in a latent space, while HST partitions this space to isolate anomalies efficiently. ARES operates in an unsupervised way without the need for prior data labeling. To further validate its detection capabilities, we additionally incorporate a simple yet effective supervised thresholding mechanism. This approach leverages statistical dispersion among anomaly scores to determine the optimal threshold using a minimal set of labeled data, ensuring adaptability across different domains. We validate ARES through extensive evaluations across several real-world cyber-attack scenarios, comparing its performance against existing methods while analyzing its space and time complexity.

ARES: Anomaly Recognition Model For Edge Streams

TL;DR

The paper tackles real-time anomaly detection in edge streams by modeling temporal edge interactions as graphs and proposing ARES, an unsupervised framework that combines Graph Neural Network embeddings with Half-Space Trees for fast anomaly scoring.ARES uses a Graph Autoencoder with a GraphSAGE encoder to generate node and edge embeddings, and employs dual Half-Space Tree ensembles to produce per-edge anomaly scores, enabling detection of both spike and burst anomalies in streaming graphs.To translate anomaly scores into actionable decisions, the method includes a simple, validation-based adaptive thresholding mechanism using the Gini index, allowing domain-robust discrimination with minimal supervision.Extensive experiments on seven real-world cybersecurity datasets show that ARES generally outperforms state-of-the-art methods in ROC-AUC and AP, with favorable scalability and robustness to concept drift, supported by ablation studies validating the effectiveness of the GNN+HST combination.

Abstract

Many real-world scenarios involving streaming information can be represented as temporal graphs, where data flows through dynamic changes in edges over time. Anomaly detection in this context has the objective of identifying unusual temporal connections within the graph structure. Detecting edge anomalies in real time is crucial for mitigating potential risks. Unlike traditional anomaly detection, this task is particularly challenging due to concept drifts, large data volumes, and the need for real-time response. To face these challenges, we introduce ARES, an unsupervised anomaly detection framework for edge streams. ARES combines Graph Neural Networks (GNNs) for feature extraction with Half-Space Trees (HST) for anomaly scoring. GNNs capture both spike and burst anomalous behaviors within streams by embedding node and edge properties in a latent space, while HST partitions this space to isolate anomalies efficiently. ARES operates in an unsupervised way without the need for prior data labeling. To further validate its detection capabilities, we additionally incorporate a simple yet effective supervised thresholding mechanism. This approach leverages statistical dispersion among anomaly scores to determine the optimal threshold using a minimal set of labeled data, ensuring adaptability across different domains. We validate ARES through extensive evaluations across several real-world cyber-attack scenarios, comparing its performance against existing methods while analyzing its space and time complexity.

Paper Structure

This paper contains 29 sections, 5 equations, 4 figures, 16 tables, 1 algorithm.

Figures (4)

  • Figure 1: Connections among endpoints over time (T1, T2, T3). Fig (a) and (b) depict the typical behavior of connections among endpoints. Fig (c) shows a significant increase of connections from $s_2, ..., s_6$ to $s_1$.
  • Figure 2: Framework overview. Given a streaming of edges denoted as $\mathcal{G}$ (1), we use a pre-trained Graph Neural Network, $GNN^{enc}$ (2), to derive a set of node embeddings, represented as $H_{t}^v$ (3). We also generate a set of edge embeddings denoted as $H_{t}^e$ (4). These embeddings serve as inputs to two Half-Space Trees, designated as $HST^v$ and $HST^e$, which encode unexpectedness for both nodes and edges according to their positioning in the latent space (5). The final anomaly score (6) is hence computed by combining information from these trees both at a node and an edge level.
  • Figure 3: Comparisons of ROC-AUC scores over time among $\mathrm{ARES}$-Static, $\mathrm{ARES}$-Dynamic, MIDAS, and SLADE-H. The best variant of MIDAS is take into account for each dataset, as reported in Table \ref{['tab:auc_ap_results']}
  • Figure 4: ARES-Static timings on CIC-IDS2017 dataset as the number of edges increases