Table of Contents
Fetching ...

Deep Multi-Task Learning for Malware Image Classification

Ahmed Bensaoud, Jugal Kalita

TL;DR

This work tackles malware detection by reframing it as color-image classification and solving it with a deep multi-task learning framework across binaries from Windows, Android, Linux, MacOS, and iOS. It combines large-scale, multi-format data with CycleGAN-driven data augmentation for MacOS samples and a seven-task CNN with PReLU activations to achieve near-perfect accuracy. The study demonstrates that multi-task learning improves performance over single-task baselines and that color images capture richer discriminative information than grayscale. The results, on a public dataset, suggest strong practical potential for fast, robust malware detection against obfuscation techniques.

Abstract

Malicious software is a pernicious global problem. A novel multi-task learning framework is proposed in this paper for malware image classification for accurate and fast malware detection. We generate bitmap (BMP) and (PNG) images from malware features, which we feed to a deep learning classifier. Our state-of-the-art multi-task learning approach has been tested on a new dataset, for which we have collected approximately 100,000 benign and malicious PE, APK, Mach-o, and ELF examples. Experiments with seven tasks tested with 4 activation functions, ReLU, LeakyReLU, PReLU, and ELU separately demonstrate that PReLU gives the highest accuracy of more than 99.87% on all tasks. Our model can effectively detect a variety of obfuscation methods like packing, encryption, and instruction overlapping, strengthing the beneficial claims of our model, in addition to achieving the state-of-art methods in terms of accuracy.

Deep Multi-Task Learning for Malware Image Classification

TL;DR

This work tackles malware detection by reframing it as color-image classification and solving it with a deep multi-task learning framework across binaries from Windows, Android, Linux, MacOS, and iOS. It combines large-scale, multi-format data with CycleGAN-driven data augmentation for MacOS samples and a seven-task CNN with PReLU activations to achieve near-perfect accuracy. The study demonstrates that multi-task learning improves performance over single-task baselines and that color images capture richer discriminative information than grayscale. The results, on a public dataset, suggest strong practical potential for fast, robust malware detection against obfuscation techniques.

Abstract

Malicious software is a pernicious global problem. A novel multi-task learning framework is proposed in this paper for malware image classification for accurate and fast malware detection. We generate bitmap (BMP) and (PNG) images from malware features, which we feed to a deep learning classifier. Our state-of-the-art multi-task learning approach has been tested on a new dataset, for which we have collected approximately 100,000 benign and malicious PE, APK, Mach-o, and ELF examples. Experiments with seven tasks tested with 4 activation functions, ReLU, LeakyReLU, PReLU, and ELU separately demonstrate that PReLU gives the highest accuracy of more than 99.87% on all tasks. Our model can effectively detect a variety of obfuscation methods like packing, encryption, and instruction overlapping, strengthing the beneficial claims of our model, in addition to achieving the state-of-art methods in terms of accuracy.
Paper Structure (33 sections, 2 equations, 21 figures, 7 tables)

This paper contains 33 sections, 2 equations, 21 figures, 7 tables.

Figures (21)

  • Figure 1: Total number of malware is increasing from quarter to quarter
  • Figure 2: Hard parameter sharing for multi-task learning (MTL)
  • Figure 3: Soft parameter sharing for multi-task learning (MTL)
  • Figure 4: PE file structure
  • Figure 5: Converting MacOS malware Mach-o file to image
  • ...and 16 more figures