Table of Contents
Fetching ...

Explainable AI for Comparative Analysis of Intrusion Detection Models

Pap M. Corea, Yongxin Liu, Jian Wang, Shuteng Niu, Houbing Song

TL;DR

This work tackles the problem of interpretability in intrusion detection by applying Occlusion Sensitivity to a diverse set of classifiers trained on the UNSW-NB15 dataset. The study demonstrates that most models rely on a small subset of features (often fewer than three) to achieve high accuracy, emphasizing the potential for targeted feature engineering over increasingly complex models. Among the evaluated methods, Random Forest consistently delivers superior robustness, time efficiency, and overall accuracy, underscoring its practicality for IDS deployments. The results highlight the value of explainable AI for diagnosing model behavior and guiding feature engineering, with the authors providing data and code to support reproducibility and further research.

Abstract

Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic on the same dataset using occlusion sensitivity. The models evaluated include Linear Regression, Logistic Regression, Linear Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, Decision Trees, and Multi-Layer Perceptrons (MLP). We trained all models to the accuracy of 90\% on the UNSW-NB15 Dataset. We found that most classifiers leverage only less than three critical features to achieve such accuracies, indicating that effective feature engineering could actually be far more important for intrusion detection than applying complicated models. We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness. Data and code available at https://github.com/pcwhy/XML-IntrusionDetection.git

Explainable AI for Comparative Analysis of Intrusion Detection Models

TL;DR

This work tackles the problem of interpretability in intrusion detection by applying Occlusion Sensitivity to a diverse set of classifiers trained on the UNSW-NB15 dataset. The study demonstrates that most models rely on a small subset of features (often fewer than three) to achieve high accuracy, emphasizing the potential for targeted feature engineering over increasingly complex models. Among the evaluated methods, Random Forest consistently delivers superior robustness, time efficiency, and overall accuracy, underscoring its practicality for IDS deployments. The results highlight the value of explainable AI for diagnosing model behavior and guiding feature engineering, with the authors providing data and code to support reproducibility and further research.

Abstract

Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic on the same dataset using occlusion sensitivity. The models evaluated include Linear Regression, Logistic Regression, Linear Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, Decision Trees, and Multi-Layer Perceptrons (MLP). We trained all models to the accuracy of 90\% on the UNSW-NB15 Dataset. We found that most classifiers leverage only less than three critical features to achieve such accuracies, indicating that effective feature engineering could actually be far more important for intrusion detection than applying complicated models. We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness. Data and code available at https://github.com/pcwhy/XML-IntrusionDetection.git
Paper Structure (11 sections, 11 figures)

This paper contains 11 sections, 11 figures.

Figures (11)

  • Figure 1: Distribution of intrusion attack categories after data preprocessing.
  • Figure 2: Feature correlation matrix
  • Figure 3: Selected features for binary classifiers.
  • Figure 4: Selected features for multi-class classifiers.
  • Figure 5: Feature sensitivity of intrusion detection model classifiers trained with complete features.
  • ...and 6 more figures