Through the Static: Demystifying Malware Visualization via Explainability
Matteo Brosolo, Vinod Puthuvath, Mauro Conti
TL;DR
This study tackles replicability and explainability in malware image classification by reproducing six CNN models on MalImg, Big2015, and a new VX-Zoo dataset, and by applying CAM-based explanations (GradCAM and HiResCAM). It introduces a heatmap-driven masking technique and assesses its impact using a Visual Transformer (ViT) to enhance classification and interpretability. The work demonstrates that HiResCAM-based masks yield more faithful explanations and can improve ViT performance by leveraging cumulative heatmaps, with reported F1 improvements of up to 2–8% on several datasets. Overall, the paper highlights the need for transparent reporting and provides a practical framework for combining CAM-based explainability with masking to bolster malware classifier robustness and trustworthiness.
Abstract
Security researchers grapple with the surge of malicious files, necessitating swift identification and classification of malware strains for effective protection. Visual classifiers and in particular Convolutional Neural Networks (CNNs) have emerged as vital tools for this task. However, issues of robustness and explainability, common in other high risk domain like medicine and autonomous vehicles, remain understudied in current literature. Although deep learning visualization classifiers presented in research obtain great results without the need for expert feature extraction, they have not been properly studied in terms of their replicability. Additionally, the literature is not clear on how these types of classifiers arrive to their answers. Our study addresses these gaps by replicating six CNN models and exploring their pitfalls. We employ Class Activation Maps (CAMs), like GradCAM and HiResCAM, to assess model explainability. We evaluate the CNNs' performance and interpretability on two standard datasets, MalImg and Big2015, and a newly created called VX-Zoo. We employ these different CAM techniques to gauge the explainability of each of the models. With these tools, we investigate the underlying factors contributing to different interpretations of inputs across the different models, empowering human researchers to discern patterns crucial for identifying distinct malware families and explain why CNN models arrive at their conclusions. Other then highlighting the patterns found in the interpretability study, we employ the extracted heatmpas to enhance Visual Transformers classifiers' performance and explanation quality. This approach yields substantial improvements in F1 score, ranging from 2% to 8%, across the datasets compared to benchmark values.
