Table of Contents
Fetching ...

Deep Learning Fusion For Effective Malware Detection: Leveraging Visual Features

Jahez Abraham Johny, Vinod P., Asmitha K. A., G. Radhamani, Rafidha Rehiman K. A., Mauro Conti

TL;DR

The paper tackles malware detection by fusing three visual representations of binaries (Grayscale Image, Entropy Graph, SimHash Image) using separate VGG16 branches and various fusion operators. It evaluates how fusion choices affect classification performance, revealing that concatenation across all modalities yields near-perfect F1 scores on the BIG2015 dataset and robust performance against obfuscated samples. Interpretability is addressed with Grad-CAM and t-SNE visualizations, showing meaningful, explorable feature regions and clusters that support trust in the model. The approach achieves real-time detection and demonstrates advantages over prior methods by preserving modality-specific information and providing insights into decision-making through activation maps and cluster analyses.

Abstract

Malware has become a formidable threat as it has been growing exponentially in number and sophistication, thus, it is imperative to have a solution that is easy to implement, reliable, and effective. While recent research has introduced deep learning multi-feature fusion algorithms, they lack a proper explanation. In this work, we investigate the power of fusing Convolutional Neural Network models trained on different modalities of a malware executable. We are proposing a novel multimodal fusion algorithm, leveraging three different visual malware features: Grayscale Image, Entropy Graph, and SimHash Image, with which we conducted exhaustive experiments independently on each feature and combinations of all three of them using fusion operators such as average, maximum, add, and concatenate for effective malware detection and classification. The proposed strategy has a detection rate of 1.00 (on a scale of 0-1) in identifying malware in the given dataset. We explained its interpretability with visualization techniques such as t-SNE and Grad-CAM. Experimental results show the model works even for a highly imbalanced dataset. We also assessed the effectiveness of the proposed method on obfuscated malware and achieved state-of-the-art results. The proposed methodology is more reliable as our findings prove VGG16 model can detect and classify malware in a matter of seconds in real-time.

Deep Learning Fusion For Effective Malware Detection: Leveraging Visual Features

TL;DR

The paper tackles malware detection by fusing three visual representations of binaries (Grayscale Image, Entropy Graph, SimHash Image) using separate VGG16 branches and various fusion operators. It evaluates how fusion choices affect classification performance, revealing that concatenation across all modalities yields near-perfect F1 scores on the BIG2015 dataset and robust performance against obfuscated samples. Interpretability is addressed with Grad-CAM and t-SNE visualizations, showing meaningful, explorable feature regions and clusters that support trust in the model. The approach achieves real-time detection and demonstrates advantages over prior methods by preserving modality-specific information and providing insights into decision-making through activation maps and cluster analyses.

Abstract

Malware has become a formidable threat as it has been growing exponentially in number and sophistication, thus, it is imperative to have a solution that is easy to implement, reliable, and effective. While recent research has introduced deep learning multi-feature fusion algorithms, they lack a proper explanation. In this work, we investigate the power of fusing Convolutional Neural Network models trained on different modalities of a malware executable. We are proposing a novel multimodal fusion algorithm, leveraging three different visual malware features: Grayscale Image, Entropy Graph, and SimHash Image, with which we conducted exhaustive experiments independently on each feature and combinations of all three of them using fusion operators such as average, maximum, add, and concatenate for effective malware detection and classification. The proposed strategy has a detection rate of 1.00 (on a scale of 0-1) in identifying malware in the given dataset. We explained its interpretability with visualization techniques such as t-SNE and Grad-CAM. Experimental results show the model works even for a highly imbalanced dataset. We also assessed the effectiveness of the proposed method on obfuscated malware and achieved state-of-the-art results. The proposed methodology is more reliable as our findings prove VGG16 model can detect and classify malware in a matter of seconds in real-time.
Paper Structure (17 sections, 7 equations, 12 figures, 7 tables)

This paper contains 17 sections, 7 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Process of Classification, with three features and VGG16 Models of each feature. Each model is then fused using the operator $\odot$ which can be either add, max, avg, or concatenation
  • Figure 2: Highly imbalanced dataset of 9 malware families
  • Figure 3: Grid representation of calculating the interpolated pixel P. Pixels: T11, T12, T21, T22 are used used to calculate $S_{1}$ and $S_{2}$, then from which P
  • Figure 4: Grayscale, Entropy Graph, and Simhash of malware families (Gatak, Kelihos_ver3, and Vundo)
  • Figure 5: Proposed Architecture
  • ...and 7 more figures