Table of Contents
Fetching ...

Self-Supervised Learning of Graph Representations for Network Intrusion Detection

Lorenzo Guerra, Thomas Chapuis, Guillaume Duc, Pavlo Mozharovskyi, Van-Tam Nguyen

TL;DR

GraphIDS tackles network intrusion detection under limited supervision by unifying local graph representation learning with global co-occurrence modeling in an end-to-end framework. It combines a 1-hop E-GraphSAGE encoder to embed flows with local topology and a Transformer-based masked autoencoder to reconstruct these embeddings, training to minimize reconstruction error on benign traffic. The approach delivers state-of-the-art performance on NetFlow-based benchmarks, with substantial gains in PR-AUC and macro F1 across v2 and v3 feature sets, and demonstrates robustness to unseen attacks. This work highlights the practical value of jointly modeling network topology and global flow co-occurrence for real-time NIDS without reliance on labeled anomaly data.

Abstract

Detecting intrusions in network traffic is a challenging task, particularly under limited supervision and constantly evolving attack patterns. While recent works have leveraged graph neural networks for network intrusion detection, they often decouple representation learning from anomaly detection, limiting the utility of the embeddings for identifying attacks. We propose GraphIDS, a self-supervised intrusion detection model that unifies these two stages by learning local graph representations of normal communication patterns through a masked autoencoder. An inductive graph neural network embeds each flow with its local topological context to capture typical network behavior, while a Transformer-based encoder-decoder reconstructs these embeddings, implicitly learning global co-occurrence patterns via self-attention without requiring explicit positional information. During inference, flows with unusually high reconstruction errors are flagged as potential intrusions. This end-to-end framework ensures that embeddings are directly optimized for the downstream task, facilitating the recognition of malicious traffic. On diverse NetFlow benchmarks, GraphIDS achieves up to 99.98% PR-AUC and 99.61% macro F1-score, outperforming baselines by 5-25 percentage points.

Self-Supervised Learning of Graph Representations for Network Intrusion Detection

TL;DR

GraphIDS tackles network intrusion detection under limited supervision by unifying local graph representation learning with global co-occurrence modeling in an end-to-end framework. It combines a 1-hop E-GraphSAGE encoder to embed flows with local topology and a Transformer-based masked autoencoder to reconstruct these embeddings, training to minimize reconstruction error on benign traffic. The approach delivers state-of-the-art performance on NetFlow-based benchmarks, with substantial gains in PR-AUC and macro F1 across v2 and v3 feature sets, and demonstrates robustness to unseen attacks. This work highlights the practical value of jointly modeling network topology and global flow co-occurrence for real-time NIDS without reliance on labeled anomaly data.

Abstract

Detecting intrusions in network traffic is a challenging task, particularly under limited supervision and constantly evolving attack patterns. While recent works have leveraged graph neural networks for network intrusion detection, they often decouple representation learning from anomaly detection, limiting the utility of the embeddings for identifying attacks. We propose GraphIDS, a self-supervised intrusion detection model that unifies these two stages by learning local graph representations of normal communication patterns through a masked autoencoder. An inductive graph neural network embeds each flow with its local topological context to capture typical network behavior, while a Transformer-based encoder-decoder reconstructs these embeddings, implicitly learning global co-occurrence patterns via self-attention without requiring explicit positional information. During inference, flows with unusually high reconstruction errors are flagged as potential intrusions. This end-to-end framework ensures that embeddings are directly optimized for the downstream task, facilitating the recognition of malicious traffic. On diverse NetFlow benchmarks, GraphIDS achieves up to 99.98% PR-AUC and 99.61% macro F1-score, outperforming baselines by 5-25 percentage points.

Paper Structure

This paper contains 37 sections, 3 equations, 12 figures, 14 tables.

Figures (12)

  • Figure 1: Overview of GraphIDS: the model detects network intrusions by evaluating the reconstruction error of graph-based flow embeddings. Flows representing attacks (highlighted in red) typically yield higher reconstruction errors, as they deviate from normal communication patterns.
  • Figure 2: Illustration of the graph construction process. Network flows are transformed into a directed graph, where nodes represent hosts (IP addresses) and edges correspond to communication flows between them. Edge features capture flow statistics such as packet count, byte count, and protocol information.
  • Figure 3: Overview of the full training pipeline. During inference, attention masking is omitted. The reconstruction errors $s_1, s_2, \dots, s_n$ serve as anomaly scores for each network flow.
  • Figure 4: Precision-recall curves for all models on each dataset.
  • Figure 5: Anomaly score by attack type in NF-UNSW-NB15-v3.
  • ...and 7 more figures