Explainable Deep Learning Models for Dynamic and Online Malware Classification
Quincy Card, Daniel Simpson, Kshitiz Aryal, Maanak Gupta, Sheikh Rabiul Islam
TL;DR
This work addresses the need for interpretable malware classification in both dynamic and online execution environments. It trains FFNN and CNN models on feature sets drawn from dynamic Android and online Windows datasets, and applies SHAP, LIME, and Permutation Importance to provide global and local explanations of predictions. The study demonstrates competitive classification performance, shows the benefits of SMOTE for imbalanced data, and analyzes the computational costs and practical robustness of explanation methods in time-series contexts. The findings offer guidance for deploying real-time, interpretable malware detectors and highlight future directions, including time-series explainability and adversarial considerations.
Abstract
In recent years, there has been a significant surge in malware attacks, necessitating more advanced preventive measures and remedial strategies. While several successful AI-based malware classification approaches exist categorized into static, dynamic, or online analysis, most successful AI models lack easily interpretable decisions and explanations for their processes. Our paper aims to delve into explainable malware classification across various execution environments (such as dynamic and online), thoroughly analyzing their respective strengths, weaknesses, and commonalities. To evaluate our approach, we train Feed Forward Neural Networks (FFNN) and Convolutional Neural Networks (CNN) to classify malware based on features obtained from dynamic and online analysis environments. The feature attribution for malware classification is performed by explainability tools, SHAP, LIME and Permutation Importance. We perform a detailed evaluation of the calculated global and local explanations from the experiments, discuss limitations and, ultimately, offer recommendations for achieving a balanced approach.
