Table of Contents
Fetching ...

AutoGraphAD: A novel approach using Variational Graph Autoencoders for anomalous network flow detection

Georgios Anyfantis, Pere Barlet-Ros

TL;DR

This paper tackles the challenge of network intrusion detection with limited labelled data by proposing AutoGraphAD, an unsupervised anomaly-detection framework that builds time-windowed heterogeneous graphs (IP and connection nodes) and trains a variational graph autoencoder with contrastive objectives. The method relies solely on reconstruction errors and KL divergence to derive anomaly scores, avoiding costly downstream detectors. AutoGraphAD demonstrates comparable or superior performance to the state-of-the-art unsupervised method Anomal-E on UNSW-NB15, while offering significantly faster training and inference and smaller embedding sizes. The approach shows practical potential for real-time NIDS deployment due to its GPU-native implementation, tunable anomaly scoring, and lack of dependence on external anomaly-detector components.

Abstract

Network Intrusion Detection Systems (NIDS) are essential tools for detecting network attacks and intrusions. While extensive research has explored the use of supervised Machine Learning for attack detection and characterisation, these methods require accurately labelled datasets, which are very costly to obtain. Moreover, existing public datasets have limited and/or outdated attacks, and many of them suffer from mislabelled data. To reduce the reliance on labelled data, we propose AutoGraphAD, a novel unsupervised anomaly detection approach based on a Heterogeneous Variational Graph Autoencoder. AutoGraphAD operates on heterogeneous graphs, made from connection and IP nodes that capture network activity within a time window. The model is trained using unsupervised and contrastive learning, without relying on any labelled data. The reconstruction, structural loss, and KL divergence are then weighted and combined in an anomaly score that is then used for anomaly detection. Overall, AutoGraphAD yields the same, and in some cases better, results than previous unsupervised approaches, such as Anomal-E, but without requiring costly downstream anomaly detectors. As a result, AutoGraphAD achieves around 1.18 orders of magnitude faster training and 1.03 orders of magnitude faster inference, which represents a significant advantage for operational deployment.

AutoGraphAD: A novel approach using Variational Graph Autoencoders for anomalous network flow detection

TL;DR

This paper tackles the challenge of network intrusion detection with limited labelled data by proposing AutoGraphAD, an unsupervised anomaly-detection framework that builds time-windowed heterogeneous graphs (IP and connection nodes) and trains a variational graph autoencoder with contrastive objectives. The method relies solely on reconstruction errors and KL divergence to derive anomaly scores, avoiding costly downstream detectors. AutoGraphAD demonstrates comparable or superior performance to the state-of-the-art unsupervised method Anomal-E on UNSW-NB15, while offering significantly faster training and inference and smaller embedding sizes. The approach shows practical potential for real-time NIDS deployment due to its GPU-native implementation, tunable anomaly scoring, and lack of dependence on external anomaly-detector components.

Abstract

Network Intrusion Detection Systems (NIDS) are essential tools for detecting network attacks and intrusions. While extensive research has explored the use of supervised Machine Learning for attack detection and characterisation, these methods require accurately labelled datasets, which are very costly to obtain. Moreover, existing public datasets have limited and/or outdated attacks, and many of them suffer from mislabelled data. To reduce the reliance on labelled data, we propose AutoGraphAD, a novel unsupervised anomaly detection approach based on a Heterogeneous Variational Graph Autoencoder. AutoGraphAD operates on heterogeneous graphs, made from connection and IP nodes that capture network activity within a time window. The model is trained using unsupervised and contrastive learning, without relying on any labelled data. The reconstruction, structural loss, and KL divergence are then weighted and combined in an anomaly score that is then used for anomaly detection. Overall, AutoGraphAD yields the same, and in some cases better, results than previous unsupervised approaches, such as Anomal-E, but without requiring costly downstream anomaly detectors. As a result, AutoGraphAD achieves around 1.18 orders of magnitude faster training and 1.03 orders of magnitude faster inference, which represents a significant advantage for operational deployment.

Paper Structure

This paper contains 25 sections, 7 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Dataset Pre-processing pipeline for graph generation
  • Figure 2: The proposed Architecture of AutoGraphAD in a training setting. AutoGraphAD mainly focuses on the reconstruction and use of the connection nodes as the IP nodes have placeholder values. The pipeline starts with the encoder that generates the latent space embeddings that are then reparametised to create the embeddings that are then used for structure and feature reconstruction. Using the reconstructed values, we calculate the reconstruction losses used for back-propagation. The GNN encoder and decoder can be switched out with every other GNN algorithm. The same can be done regarding the losses that are used in Back-propagation.
  • Figure 3: Performance metrics in 0% training dataset.
  • Figure 4: Performance metrics at 3.36% contamination in the training dataset.
  • Figure 5: Performance metrics of all the approaches at 5.76% contamination.
  • ...and 4 more figures