Table of Contents
Fetching ...

Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems

Chenglong Wang, Pujia Zheng, Jiaping Gui, Cunqing Hua, Wajih Ul Hassan

TL;DR

The paper tackles the reproducibility and replicability challenges of state-of-the-art graph-based NIDS by proposing GidsRep, a framework that combines artifact reuse and re-implementation to evaluate five SOTA GIDS (Anomal-E, VGRNN, PIKACHU, EULER, ARGUS) across three public datasets (LANL, OpTC, CIC-IDS-2017) and a large-scale enterprise dataset, including adversarial robustness tests. It uncovers significant performance discrepancies across datasets and sensitivity to hyperparameters, with limited generalization to enterprise traffic and notable scalability and memory constraints on large graphs. Adversarial evasion remains a concern for several models, underscoring the need for defense mechanisms and robust optimization. The work contributes practical recommendations for rigorous reproduction/replication, improves reporting standards, and provides an open enterprise dataset to drive more generalizable and robust graph-based intrusion detection research.

Abstract

Network Intrusion Detection Systems (NIDS) are vital for ensuring enterprise security. Recently, Graph-based NIDS (GIDS) have attracted considerable attention because of their capability to effectively capture the complex relationships within the graph structures of data communications. Despite their promise, the reproducibility and replicability of these GIDS remain largely unexplored, posing challenges for developing reliable and robust detection systems. This study bridges this gap by designing a systematic approach to evaluate state-of-the-art GIDS, which includes critically assessing, extending, and clarifying the findings of these systems. We further assess the robustness of GIDS under adversarial attacks. Evaluations were conducted on three public datasets as well as a newly collected large-scale enterprise dataset. Our findings reveal significant performance discrepancies, highlighting challenges related to dataset scale, model inputs, and implementation settings. We demonstrate difficulties in reproducing and replicating results, particularly concerning false positive rates and robustness against adversarial attacks. This work provides valuable insights and recommendations for future research, emphasizing the importance of rigorous reproduction and replication studies in developing robust and generalizable GIDS solutions.

Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems

TL;DR

The paper tackles the reproducibility and replicability challenges of state-of-the-art graph-based NIDS by proposing GidsRep, a framework that combines artifact reuse and re-implementation to evaluate five SOTA GIDS (Anomal-E, VGRNN, PIKACHU, EULER, ARGUS) across three public datasets (LANL, OpTC, CIC-IDS-2017) and a large-scale enterprise dataset, including adversarial robustness tests. It uncovers significant performance discrepancies across datasets and sensitivity to hyperparameters, with limited generalization to enterprise traffic and notable scalability and memory constraints on large graphs. Adversarial evasion remains a concern for several models, underscoring the need for defense mechanisms and robust optimization. The work contributes practical recommendations for rigorous reproduction/replication, improves reporting standards, and provides an open enterprise dataset to drive more generalizable and robust graph-based intrusion detection research.

Abstract

Network Intrusion Detection Systems (NIDS) are vital for ensuring enterprise security. Recently, Graph-based NIDS (GIDS) have attracted considerable attention because of their capability to effectively capture the complex relationships within the graph structures of data communications. Despite their promise, the reproducibility and replicability of these GIDS remain largely unexplored, posing challenges for developing reliable and robust detection systems. This study bridges this gap by designing a systematic approach to evaluate state-of-the-art GIDS, which includes critically assessing, extending, and clarifying the findings of these systems. We further assess the robustness of GIDS under adversarial attacks. Evaluations were conducted on three public datasets as well as a newly collected large-scale enterprise dataset. Our findings reveal significant performance discrepancies, highlighting challenges related to dataset scale, model inputs, and implementation settings. We demonstrate difficulties in reproducing and replicating results, particularly concerning false positive rates and robustness against adversarial attacks. This work provides valuable insights and recommendations for future research, emphasizing the importance of rigorous reproduction and replication studies in developing robust and generalizable GIDS solutions.

Paper Structure

This paper contains 16 sections, 2 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 1: Workflow of GidsRep
  • Figure 2: Impact of key implementation parameters on AP. Notice that some models lack results corresponding to certain parameters due to the absence of those parameters. Table \ref{['tab:parameter-description']} in the appendix explains each parameter on the X axis.
  • Figure 3: Performance comparison of target models on the LANL and OpTC datasets. Notice that some original results of VGRNN, PIKACHU and EULER are not plotted in the figure since they are not provided in their original papers.
  • Figure 4: Memory usage and training/testing time of the target models. The red horizontal line represents the maximum memory or the time budgets available in the experimental environment.
  • Figure 5: Impact of key implementation parameters on AUC.
  • ...and 2 more figures