Table of Contents
Fetching ...

A Comparative Analysis of DNN-based White-Box Explainable AI Methods in Network Security

Osvaldo Arreche, Mustafa Abdallah

TL;DR

This paper tackles the interpretability gap in neural network–based network intrusion detection by deploying a white-box XAI framework that uses LRP, Integrated Gradients, and DeepLift to generate explanations. It evaluates these methods with six metrics—$Descriptive Accuracy$, $Sparsity$, $Stability$, $Robustness$, $Efficiency$, and $Completeness$—across three datasets: NSL-KDD, CICIDS-2017, and RoEduNet-SIMARGL2021, showing that white-box approaches generally yield robust and complete explanations and often outperform black-box baselines. The authors provide an end-to-end pipeline, detailed metric algorithms, and open-source code to enable reproducibility and community extension. The work highlights practical considerations for deploying XAI in real-time IDS and suggests directions for improving robustness and efficiency, while offering a valuable benchmark and methodology for future research in explainable security analytics.

Abstract

New research focuses on creating artificial intelligence (AI) solutions for network intrusion detection systems (NIDS), drawing its inspiration from the ever-growing number of intrusions on networked systems, increasing its complexity and intelligibility. Hence, the use of explainable AI (XAI) techniques in real-world intrusion detection systems comes from the requirement to comprehend and elucidate black-box AI models to security analysts. In an effort to meet such requirements, this paper focuses on applying and evaluating White-Box XAI techniques (particularly LRP, IG, and DeepLift) for NIDS via an end-to-end framework for neural network models, using three widely used network intrusion datasets (NSL-KDD, CICIDS-2017, and RoEduNet-SIMARGL2021), assessing its global and local scopes, and examining six distinct assessment measures (descriptive accuracy, sparsity, stability, robustness, efficiency, and completeness). We also compare the performance of white-box XAI methods with black-box XAI methods. The results show that using White-box XAI techniques scores high in robustness and completeness, which are crucial metrics for IDS. Moreover, the source codes for the programs developed for our XAI evaluation framework are available to be improved and used by the research community.

A Comparative Analysis of DNN-based White-Box Explainable AI Methods in Network Security

TL;DR

This paper tackles the interpretability gap in neural network–based network intrusion detection by deploying a white-box XAI framework that uses LRP, Integrated Gradients, and DeepLift to generate explanations. It evaluates these methods with six metrics—, , , , , and —across three datasets: NSL-KDD, CICIDS-2017, and RoEduNet-SIMARGL2021, showing that white-box approaches generally yield robust and complete explanations and often outperform black-box baselines. The authors provide an end-to-end pipeline, detailed metric algorithms, and open-source code to enable reproducibility and community extension. The work highlights practical considerations for deploying XAI in real-time IDS and suggests directions for improving robustness and efficiency, while offering a valuable benchmark and methodology for future research in explainable security analytics.

Abstract

New research focuses on creating artificial intelligence (AI) solutions for network intrusion detection systems (NIDS), drawing its inspiration from the ever-growing number of intrusions on networked systems, increasing its complexity and intelligibility. Hence, the use of explainable AI (XAI) techniques in real-world intrusion detection systems comes from the requirement to comprehend and elucidate black-box AI models to security analysts. In an effort to meet such requirements, this paper focuses on applying and evaluating White-Box XAI techniques (particularly LRP, IG, and DeepLift) for NIDS via an end-to-end framework for neural network models, using three widely used network intrusion datasets (NSL-KDD, CICIDS-2017, and RoEduNet-SIMARGL2021), assessing its global and local scopes, and examining six distinct assessment measures (descriptive accuracy, sparsity, stability, robustness, efficiency, and completeness). We also compare the performance of white-box XAI methods with black-box XAI methods. The results show that using White-box XAI techniques scores high in robustness and completeness, which are crucial metrics for IDS. Moreover, the source codes for the programs developed for our XAI evaluation framework are available to be improved and used by the research community.
Paper Structure (34 sections, 8 figures, 14 tables)

This paper contains 34 sections, 8 figures, 14 tables.

Figures (8)

  • Figure 1: A diagram of the XAI framework for evaluation of network intrusion detection. It considers six evaluation metrics, three white-box XAI methods, a neural network AI model, and three invaluable intrusion datasets.
  • Figure 2: The Descriptive Accuracy experiment using DeepLift, IG, and LRP white-box XAI methods. The graph displays the accuracy declining as the important intrusion features are removed in the x-axis. It demonstrates the methods’ effectiveness in global explainability in the three datasets.
  • Figure 3: The XAI techniques Sparsity plots considering LRP, IG, and DeepLift for the used datasets. The outcomes display comparable performance for the datasets. However, in the CICIDS-2017 case, IG and LRP show best performance.
  • Figure 4: An illustration of a DoS instance from the CICIDS-2017 dataset, considering the Robustness experiment using DeepLift. In (a), the feature list (with flow duration as the top feature) under a biased explanation is displayed. In (b), the list (with the engineered feature as the top feature) after the adversarial model's classification is exhibited.
  • Figure 5: The percentage of data samples for which biased and unrelated features appear in top-3 features (according to DeepLift rankings of feature importance) for the biased classifier (in (a)) and adversarial classifier (in (b), (c) and (d)) that uses one uncorrelated feature for each dataset. Note that (c) displays the best result. It barely suffers the influence of the unrelated column while displaying the Biased Feature in the third position.
  • ...and 3 more figures