Table of Contents
Fetching ...

The Road Less Traveled: Investigating Robustness and Explainability in CNN Malware Detection

Matteo Brosolo, Vinod Puthuvath, Mauro Conti

TL;DR

This paper tackles robustness and explainability of CNN-based malware detection using grayscale image representations. It introduces VXZoo, a diverse, year-spanning dataset with 15 malware families and evaluates a VGG16-inspired CNN under morphing and UPX packing obfuscations. It uses three explainability methods—HiResCAM, SHAP, and occlusion—to analyze heatmaps, identify artifacts, and show that obfuscation can reduce accuracy by up to $50\%$, while a lightweight augmentation mitigates much of the loss. The findings offer practical guidance for building interpretable, resilient intrusion-detection systems and for leveraging heatmap insights to derive potential code- or signature-level signals.

Abstract

Machine learning has become a key tool in cybersecurity, improving both attack strategies and defense mechanisms. Deep learning models, particularly Convolutional Neural Networks (CNNs), have demonstrated high accuracy in detecting malware images generated from binary data. However, the decision-making process of these black-box models remains difficult to interpret. This study addresses this challenge by integrating quantitative analysis with explainability tools such as Occlusion Maps, HiResCAM, and SHAP to better understand CNN behavior in malware classification. We further demonstrate that obfuscation techniques can reduce model accuracy by up to 50%, and propose a mitigation strategy to enhance robustness. Additionally, we analyze heatmaps from multiple tests and outline a methodology for identification of artifacts, aiding researchers in conducting detailed manual investigations. This work contributes to improving the interpretability and resilience of deep learning-based intrusion detection systems

The Road Less Traveled: Investigating Robustness and Explainability in CNN Malware Detection

TL;DR

This paper tackles robustness and explainability of CNN-based malware detection using grayscale image representations. It introduces VXZoo, a diverse, year-spanning dataset with 15 malware families and evaluates a VGG16-inspired CNN under morphing and UPX packing obfuscations. It uses three explainability methods—HiResCAM, SHAP, and occlusion—to analyze heatmaps, identify artifacts, and show that obfuscation can reduce accuracy by up to , while a lightweight augmentation mitigates much of the loss. The findings offer practical guidance for building interpretable, resilient intrusion-detection systems and for leveraging heatmap insights to derive potential code- or signature-level signals.

Abstract

Machine learning has become a key tool in cybersecurity, improving both attack strategies and defense mechanisms. Deep learning models, particularly Convolutional Neural Networks (CNNs), have demonstrated high accuracy in detecting malware images generated from binary data. However, the decision-making process of these black-box models remains difficult to interpret. This study addresses this challenge by integrating quantitative analysis with explainability tools such as Occlusion Maps, HiResCAM, and SHAP to better understand CNN behavior in malware classification. We further demonstrate that obfuscation techniques can reduce model accuracy by up to 50%, and propose a mitigation strategy to enhance robustness. Additionally, we analyze heatmaps from multiple tests and outline a methodology for identification of artifacts, aiding researchers in conducting detailed manual investigations. This work contributes to improving the interpretability and resilience of deep learning-based intrusion detection systems

Paper Structure

This paper contains 16 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Examples of different malware families' samples.
  • Figure 2: Evolution of partial training set evaluation.
  • Figure 3: Confusion matrix for the base test set and base training set.
  • Figure 4: Confusion matrix for the base test set and enhanced training set.
  • Figure 5: Classes on which explanation techniques identify interesting global patterns.
  • ...and 3 more figures