Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems

Chenglong Wang; Pujia Zheng; Jiaping Gui; Cunqing Hua; Wajih Ul Hassan

Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems

Chenglong Wang, Pujia Zheng, Jiaping Gui, Cunqing Hua, Wajih Ul Hassan

TL;DR

The paper tackles the reproducibility and replicability challenges of state-of-the-art graph-based NIDS by proposing GidsRep, a framework that combines artifact reuse and re-implementation to evaluate five SOTA GIDS (Anomal-E, VGRNN, PIKACHU, EULER, ARGUS) across three public datasets (LANL, OpTC, CIC-IDS-2017) and a large-scale enterprise dataset, including adversarial robustness tests. It uncovers significant performance discrepancies across datasets and sensitivity to hyperparameters, with limited generalization to enterprise traffic and notable scalability and memory constraints on large graphs. Adversarial evasion remains a concern for several models, underscoring the need for defense mechanisms and robust optimization. The work contributes practical recommendations for rigorous reproduction/replication, improves reporting standards, and provides an open enterprise dataset to drive more generalizable and robust graph-based intrusion detection research.

Abstract

Network Intrusion Detection Systems (NIDS) are vital for ensuring enterprise security. Recently, Graph-based NIDS (GIDS) have attracted considerable attention because of their capability to effectively capture the complex relationships within the graph structures of data communications. Despite their promise, the reproducibility and replicability of these GIDS remain largely unexplored, posing challenges for developing reliable and robust detection systems. This study bridges this gap by designing a systematic approach to evaluate state-of-the-art GIDS, which includes critically assessing, extending, and clarifying the findings of these systems. We further assess the robustness of GIDS under adversarial attacks. Evaluations were conducted on three public datasets as well as a newly collected large-scale enterprise dataset. Our findings reveal significant performance discrepancies, highlighting challenges related to dataset scale, model inputs, and implementation settings. We demonstrate difficulties in reproducing and replicating results, particularly concerning false positive rates and robustness against adversarial attacks. This work provides valuable insights and recommendations for future research, emphasizing the importance of rigorous reproduction and replication studies in developing robust and generalizable GIDS solutions.

Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems

TL;DR

Abstract

Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)