Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling

Norrakith Srisumrith; Sunantha Sodsee

Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling

Norrakith Srisumrith, Sunantha Sodsee

TL;DR

This integrated Explainable AI (XAI) framework demonstrates that explainability, computational efficiency, and experimental integrity can be simultaneously achieved, providing a robust foundation for deploying trustworthy AI systems in security operations centers where decision transparency is paramount.

Abstract

The critical need for transparent and trustworthy machine learning in cybersecurity operations drives the development of this integrated Explainable AI (XAI) framework. Our methodology addresses three fundamental challenges in deploying AI for threat detection: handling massive datasets through Strategic Sampling Methodology that preserves class distributions while enabling efficient model development; ensuring experimental rigor via Automated Data Leakage Prevention that systematically identifies and removes contaminated features; and providing operational transparency through Integrated XAI Implementation using SHAP analysis for model-agnostic interpretability across algorithms. Applied to the CIC-IDS2017 dataset, our approach maintains detection efficacy while reducing computational overhead and delivering actionable explanations for security analysts. The framework demonstrates that explainability, computational efficiency, and experimental integrity can be simultaneously achieved, providing a robust foundation for deploying trustworthy AI systems in security operations centers where decision transparency is paramount.

Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling

TL;DR

Abstract

Paper Structure (35 sections, 3 equations, 6 figures, 8 tables)

This paper contains 35 sections, 3 equations, 6 figures, 8 tables.

Introduction
Related Work
Machine Learning in Cybersecurity
Ensemble Methods and Hybrid Models for Intrusion Detection
Explainable AI in Security Applications
Feature Selection and Sampling Strategies
Data Leakage Prevention in Cybersecurity ML
Integration Gaps and Research Opportunities
Comparative Analysis with State-of-the-Art
Strategic Sampling with Quantitative Validation
Temporal Validation Framework and Leakage Prevention
Experimental Framework
Robust Algorithm Selection with Cross-Configuration Validation
Data Leakage Prevention and Validation
Statistical Significance Testing
...and 20 more sections

Figures (6)

Figure 1: Comprehensive Experimental Framework with Integrated XAI and Multi-Split Validation
Figure 2: Detailed XAI Integration Workflow in Security Operations
Figure 3: Confusion Matrix Analysis for Optimal Configuration (XGBoost + MRMR_70% + 60-10-30)
Figure 4: Comprehensive ROC Analysis for Multi-class Classification
Figure 5: Gini-based feature importance from Gradient Boosting (XGBoost) with MRMR feature selection (70% retention) on 60-10-30 split. Features are ranked by their mean decrease in impurity, with packet length statistics showing dominant predictive power.
...and 1 more figures

Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling

TL;DR

Abstract

Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (6)