Table of Contents
Fetching ...

Packed Malware Detection Using Grayscale Binary-to-Image Representations

Ehab Alkhateeb, Ali Ghorbani, Arash Habibi Lashkari

TL;DR

Packing obfuscation challenges static malware analysis, motivating an image-based detection approach. The authors convert binaries to grayscale Byte plots and compare classical (Gabor-jet) features with deep learning models (VGG16 and DenseNet121), demonstrating superior performance for CNNs. They show strong generalization to unseen packers, with VGG16 favoring recall and DenseNet121 favoring precision and low false positives, supporting practical deployment in malware pipelines. The work highlights the value of diverse datasets and adversarial testing to harden image-based packing detection, while pointing to future directions in richer visual encodings and lightweight architectures.

Abstract

Detecting packed executables is a critical step in malware analysis, as packing obscures the original code and complicates static inspection. This study evaluates both classical feature-based methods and deep learning approaches that transform binary executables into visual representations, specifically, grayscale byte plots, and employ convolutional neural networks (CNNs) for automated classification of packed and non-packed binaries. A diverse dataset of benign and malicious Portable Executable (PE) files, packed using various commercial and open-source packers, was curated to capture a broad spectrum of packing transformations and obfuscation techniques. Classical models using handcrafted Gabor jet features achieved intense discrimination at moderate computational cost. In contrast, CNNs based on VGG16 and DenseNet121 significantly outperformed them, achieving high detection performance with well-balanced precision, recall, and F1-scores. DenseNet121 demonstrated slightly higher precision and lower false positive rates, whereas VGG16 achieved marginally higher recall, indicating complementary strengths for practical deployment. Evaluation against unknown packers confirmed robust generalization, demonstrating that grayscale byte-plot representations combined with deep learning provide a useful and reliable approach for early detection of packed malware, enhancing malware analysis pipelines and supporting automated antivirus inspection.

Packed Malware Detection Using Grayscale Binary-to-Image Representations

TL;DR

Packing obfuscation challenges static malware analysis, motivating an image-based detection approach. The authors convert binaries to grayscale Byte plots and compare classical (Gabor-jet) features with deep learning models (VGG16 and DenseNet121), demonstrating superior performance for CNNs. They show strong generalization to unseen packers, with VGG16 favoring recall and DenseNet121 favoring precision and low false positives, supporting practical deployment in malware pipelines. The work highlights the value of diverse datasets and adversarial testing to harden image-based packing detection, while pointing to future directions in richer visual encodings and lightweight architectures.

Abstract

Detecting packed executables is a critical step in malware analysis, as packing obscures the original code and complicates static inspection. This study evaluates both classical feature-based methods and deep learning approaches that transform binary executables into visual representations, specifically, grayscale byte plots, and employ convolutional neural networks (CNNs) for automated classification of packed and non-packed binaries. A diverse dataset of benign and malicious Portable Executable (PE) files, packed using various commercial and open-source packers, was curated to capture a broad spectrum of packing transformations and obfuscation techniques. Classical models using handcrafted Gabor jet features achieved intense discrimination at moderate computational cost. In contrast, CNNs based on VGG16 and DenseNet121 significantly outperformed them, achieving high detection performance with well-balanced precision, recall, and F1-scores. DenseNet121 demonstrated slightly higher precision and lower false positive rates, whereas VGG16 achieved marginally higher recall, indicating complementary strengths for practical deployment. Evaluation against unknown packers confirmed robust generalization, demonstrating that grayscale byte-plot representations combined with deep learning provide a useful and reliable approach for early detection of packed malware, enhancing malware analysis pipelines and supporting automated antivirus inspection.

Paper Structure

This paper contains 25 sections, 3 equations, 20 figures, 5 tables.

Figures (20)

  • Figure 1: Packing alkhateeb2024identifying.
  • Figure 2: AV engine workflow for packed files, emphasizing the binary packed/non-packed classification stage.
  • Figure 3: Grayscale images of packed PE files generated using different packer families.
  • Figure 4: Gabor feature extraction workflow.
  • Figure 5: Architecture of the VGG16 convolutional neural network.
  • ...and 15 more figures