Deep Multi-Task Learning for Malware Image Classification
Ahmed Bensaoud, Jugal Kalita
TL;DR
This work tackles malware detection by reframing it as color-image classification and solving it with a deep multi-task learning framework across binaries from Windows, Android, Linux, MacOS, and iOS. It combines large-scale, multi-format data with CycleGAN-driven data augmentation for MacOS samples and a seven-task CNN with PReLU activations to achieve near-perfect accuracy. The study demonstrates that multi-task learning improves performance over single-task baselines and that color images capture richer discriminative information than grayscale. The results, on a public dataset, suggest strong practical potential for fast, robust malware detection against obfuscation techniques.
Abstract
Malicious software is a pernicious global problem. A novel multi-task learning framework is proposed in this paper for malware image classification for accurate and fast malware detection. We generate bitmap (BMP) and (PNG) images from malware features, which we feed to a deep learning classifier. Our state-of-the-art multi-task learning approach has been tested on a new dataset, for which we have collected approximately 100,000 benign and malicious PE, APK, Mach-o, and ELF examples. Experiments with seven tasks tested with 4 activation functions, ReLU, LeakyReLU, PReLU, and ELU separately demonstrate that PReLU gives the highest accuracy of more than 99.87% on all tasks. Our model can effectively detect a variety of obfuscation methods like packing, encryption, and instruction overlapping, strengthing the beneficial claims of our model, in addition to achieving the state-of-art methods in terms of accuracy.
