Accelerating Malware Classification: A Vision Transformer Solution
Shrey Bavishi, Shrey Modi
TL;DR
Malware classification remains challenging due to evolving threats and closely related families. The authors propose LeViT-MC, a two-stage architecture that first uses a DenseNet CNN to distinguish benign from malicious samples, then applies a lightweight Vision Transformer (LeViT) to classify malware into families using image-based representations of PE binaries. Through transfer learning on ImageNet and large-scale image tasks, LeViT-MC achieves 96.6% multiclass accuracy on MaleVis with high inference speed, outperforming previous methods. This work demonstrates the viability of combining CNNs with lightweight ViTs and image-based representations to enable rapid, fine-grained malware classification in practical security settings.
Abstract
The escalating frequency and scale of recent malware attacks underscore the urgent need for swift and precise malware classification in the ever-evolving cybersecurity landscape. Key challenges include accurately categorizing closely related malware families. To tackle this evolving threat landscape, this paper proposes a novel architecture LeViT-MC which produces state-of-the-art results in malware detection and classification. LeViT-MC leverages a vision transformer-based architecture, an image-based visualization approach, and advanced transfer learning techniques. Experimental results on multi-class malware classification using the MaleVis dataset indicate LeViT-MC's significant advantage over existing models. This study underscores the critical importance of combining image-based and transfer learning techniques, with vision transformers at the forefront of the ongoing battle against evolving cyber threats. We propose a novel architecture LeViT-MC which not only achieves state of the art results on image classification but is also more time efficient.
