Table of Contents
Fetching ...

Malware Classification Based on Image Segmentation

Wanhu Nie

TL;DR

The paper addresses the challenge of classifying malware despite obfuscation by proposing a visualization-based approach that segments grayscale malware images according to executable PE sections, producing multi-channel inputs for CNN classifiers. It evaluates this method on the Microsoft BIG 2015 dataset using VGG16 and ResNet50, exploring multiple width-alignment schemes and two segmentation variants (Split and Mask). Results show competitive performance, with width alignment and per-section segmentation significantly influencing model effectiveness, and masking-based variants offering robustness under certain configurations. The work highlights potential robustness to structural obfuscation and suggests directions for future enhancements, such as attention-based masking and optimized width alignment as hyperparameters.

Abstract

Executable programs are highly structured files that can be recognized by operating systems and loaded into memory, analyzed for their dependencies, allocated resources, and ultimately executed. Each section of an executable program possesses distinct file and semantic boundaries, resembling puzzle pieces with varying shapes, textures, and sizes. These individualistic sections, when combined in diverse manners, constitute a complete executable program. This paper proposes a novel approach for the visualization and classification of malware. Specifically, we segment the grayscale images generated from malware binary files based on the section categories, resulting in multiple sub-images of different classes. These sub-images are then treated as multi-channel images and input into a deep convolutional neural network for malware classification. Experimental results demonstrate that images of different malware section classes exhibit favorable classification characteristics. Additionally, we discuss how the width alignment of malware grayscale images can influence the performance of the model.

Malware Classification Based on Image Segmentation

TL;DR

The paper addresses the challenge of classifying malware despite obfuscation by proposing a visualization-based approach that segments grayscale malware images according to executable PE sections, producing multi-channel inputs for CNN classifiers. It evaluates this method on the Microsoft BIG 2015 dataset using VGG16 and ResNet50, exploring multiple width-alignment schemes and two segmentation variants (Split and Mask). Results show competitive performance, with width alignment and per-section segmentation significantly influencing model effectiveness, and masking-based variants offering robustness under certain configurations. The work highlights potential robustness to structural obfuscation and suggests directions for future enhancements, such as attention-based masking and optimized width alignment as hyperparameters.

Abstract

Executable programs are highly structured files that can be recognized by operating systems and loaded into memory, analyzed for their dependencies, allocated resources, and ultimately executed. Each section of an executable program possesses distinct file and semantic boundaries, resembling puzzle pieces with varying shapes, textures, and sizes. These individualistic sections, when combined in diverse manners, constitute a complete executable program. This paper proposes a novel approach for the visualization and classification of malware. Specifically, we segment the grayscale images generated from malware binary files based on the section categories, resulting in multiple sub-images of different classes. These sub-images are then treated as multi-channel images and input into a deep convolutional neural network for malware classification. Experimental results demonstrate that images of different malware section classes exhibit favorable classification characteristics. Additionally, we discuss how the width alignment of malware grayscale images can influence the performance of the model.
Paper Structure (10 sections, 1 equation, 6 figures, 5 tables)

This paper contains 10 sections, 1 equation, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Correspondence between section table and sections PE-format. In the PE-Header of Windows executable programs, the section table describes the spatial layout of the entire executable, including information such as the number, order, and size of sections. Each section carries specific types of code or data with independent attributes. Executable programs exhibit distinct file and semantic boundaries.
  • Figure 2: Grayscale image comparison of malware belonging to the same family. These malware samples are from the Microsoft Malware Dataset malware-classification, which only shows samples from two malware families for comparison.
  • Figure 3: Two approaches for segmentation of malware grayscale images.
  • Figure 4: Impact of width alignment on grayscale images of malware. The grayscale images of malware shown are the results after scaling.
  • Figure 5: Architecture of VGG16.
  • ...and 1 more figures