CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls
Ahmed Bensaoud, Jugal Kalita
TL;DR
This work addresses malware classification using API calls and opcodes to improve detection accuracy amid evasion strategies. It introduces a novel CNN-LSTM architecture and a broad transfer learning evaluation, employing 8-gram API/opcode representations derived from BoW, TF-IDF, and one-hot encoding. The authors benchmark fourteen pre-trained models (including ViT variants, ConvNeXt, RegNetY, EfficientNetV2, Sequencer2D-L, and Swin-T) against a custom CNN-LSTM, reporting CNN-LSTM-3 at 99.91% accuracy and high performance from Swin-T and Sequencer2D-L as strong baselines. The results highlight the viability of transfer learning for malware classification and show that a carefully designed CNN-LSTM can outperform many modern TL models on large-scale opcode/API datasets.
Abstract
In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.
