Deep Learning Fusion For Effective Malware Detection: Leveraging Visual Features
Jahez Abraham Johny, Vinod P., Asmitha K. A., G. Radhamani, Rafidha Rehiman K. A., Mauro Conti
TL;DR
The paper tackles malware detection by fusing three visual representations of binaries (Grayscale Image, Entropy Graph, SimHash Image) using separate VGG16 branches and various fusion operators. It evaluates how fusion choices affect classification performance, revealing that concatenation across all modalities yields near-perfect F1 scores on the BIG2015 dataset and robust performance against obfuscated samples. Interpretability is addressed with Grad-CAM and t-SNE visualizations, showing meaningful, explorable feature regions and clusters that support trust in the model. The approach achieves real-time detection and demonstrates advantages over prior methods by preserving modality-specific information and providing insights into decision-making through activation maps and cluster analyses.
Abstract
Malware has become a formidable threat as it has been growing exponentially in number and sophistication, thus, it is imperative to have a solution that is easy to implement, reliable, and effective. While recent research has introduced deep learning multi-feature fusion algorithms, they lack a proper explanation. In this work, we investigate the power of fusing Convolutional Neural Network models trained on different modalities of a malware executable. We are proposing a novel multimodal fusion algorithm, leveraging three different visual malware features: Grayscale Image, Entropy Graph, and SimHash Image, with which we conducted exhaustive experiments independently on each feature and combinations of all three of them using fusion operators such as average, maximum, add, and concatenate for effective malware detection and classification. The proposed strategy has a detection rate of 1.00 (on a scale of 0-1) in identifying malware in the given dataset. We explained its interpretability with visualization techniques such as t-SNE and Grad-CAM. Experimental results show the model works even for a highly imbalanced dataset. We also assessed the effectiveness of the proposed method on obfuscated malware and achieved state-of-the-art results. The proposed methodology is more reliable as our findings prove VGG16 model can detect and classify malware in a matter of seconds in real-time.
