Table of Contents
Fetching ...

Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling

Norrakith Srisumrith, Sunantha Sodsee

TL;DR

This integrated Explainable AI (XAI) framework demonstrates that explainability, computational efficiency, and experimental integrity can be simultaneously achieved, providing a robust foundation for deploying trustworthy AI systems in security operations centers where decision transparency is paramount.

Abstract

The critical need for transparent and trustworthy machine learning in cybersecurity operations drives the development of this integrated Explainable AI (XAI) framework. Our methodology addresses three fundamental challenges in deploying AI for threat detection: handling massive datasets through Strategic Sampling Methodology that preserves class distributions while enabling efficient model development; ensuring experimental rigor via Automated Data Leakage Prevention that systematically identifies and removes contaminated features; and providing operational transparency through Integrated XAI Implementation using SHAP analysis for model-agnostic interpretability across algorithms. Applied to the CIC-IDS2017 dataset, our approach maintains detection efficacy while reducing computational overhead and delivering actionable explanations for security analysts. The framework demonstrates that explainability, computational efficiency, and experimental integrity can be simultaneously achieved, providing a robust foundation for deploying trustworthy AI systems in security operations centers where decision transparency is paramount.

Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling

TL;DR

This integrated Explainable AI (XAI) framework demonstrates that explainability, computational efficiency, and experimental integrity can be simultaneously achieved, providing a robust foundation for deploying trustworthy AI systems in security operations centers where decision transparency is paramount.

Abstract

The critical need for transparent and trustworthy machine learning in cybersecurity operations drives the development of this integrated Explainable AI (XAI) framework. Our methodology addresses three fundamental challenges in deploying AI for threat detection: handling massive datasets through Strategic Sampling Methodology that preserves class distributions while enabling efficient model development; ensuring experimental rigor via Automated Data Leakage Prevention that systematically identifies and removes contaminated features; and providing operational transparency through Integrated XAI Implementation using SHAP analysis for model-agnostic interpretability across algorithms. Applied to the CIC-IDS2017 dataset, our approach maintains detection efficacy while reducing computational overhead and delivering actionable explanations for security analysts. The framework demonstrates that explainability, computational efficiency, and experimental integrity can be simultaneously achieved, providing a robust foundation for deploying trustworthy AI systems in security operations centers where decision transparency is paramount.
Paper Structure (35 sections, 3 equations, 6 figures, 8 tables)

This paper contains 35 sections, 3 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Comprehensive Experimental Framework with Integrated XAI and Multi-Split Validation
  • Figure 2: Detailed XAI Integration Workflow in Security Operations
  • Figure 3: Confusion Matrix Analysis for Optimal Configuration (XGBoost + MRMR_70% + 60-10-30)
  • Figure 4: Comprehensive ROC Analysis for Multi-class Classification
  • Figure 5: Gini-based feature importance from Gradient Boosting (XGBoost) with MRMR feature selection (70% retention) on 60-10-30 split. Features are ranked by their mean decrease in impurity, with packet length statistics showing dominant predictive power.
  • ...and 1 more figures