Table of Contents
Fetching ...

CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls

Ahmed Bensaoud, Jugal Kalita

TL;DR

This work addresses malware classification using API calls and opcodes to improve detection accuracy amid evasion strategies. It introduces a novel CNN-LSTM architecture and a broad transfer learning evaluation, employing 8-gram API/opcode representations derived from BoW, TF-IDF, and one-hot encoding. The authors benchmark fourteen pre-trained models (including ViT variants, ConvNeXt, RegNetY, EfficientNetV2, Sequencer2D-L, and Swin-T) against a custom CNN-LSTM, reporting CNN-LSTM-3 at 99.91% accuracy and high performance from Swin-T and Sequencer2D-L as strong baselines. The results highlight the viability of transfer learning for malware classification and show that a carefully designed CNN-LSTM can outperform many modern TL models on large-scale opcode/API datasets.

Abstract

In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.

CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls

TL;DR

This work addresses malware classification using API calls and opcodes to improve detection accuracy amid evasion strategies. It introduces a novel CNN-LSTM architecture and a broad transfer learning evaluation, employing 8-gram API/opcode representations derived from BoW, TF-IDF, and one-hot encoding. The authors benchmark fourteen pre-trained models (including ViT variants, ConvNeXt, RegNetY, EfficientNetV2, Sequencer2D-L, and Swin-T) against a custom CNN-LSTM, reporting CNN-LSTM-3 at 99.91% accuracy and high performance from Swin-T and Sequencer2D-L as strong baselines. The results highlight the viability of transfer learning for malware classification and show that a carefully designed CNN-LSTM can outperform many modern TL models on large-scale opcode/API datasets.

Abstract

In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.
Paper Structure (27 sections, 11 equations, 8 figures, 13 tables)

This paper contains 27 sections, 11 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Obtain TF-IDF and BoW for each malware sample and then concatenate.
  • Figure 2: Proposed model of CNN-LSTM.
  • Figure 3: Proposed approach with deep learning models compared.
  • Figure 4: Loss trends for various deep learning models. Clearly, the CNN-LSTM-3 model has the lowest loss all through as the number of epochs increases.
  • Figure 5: Accuracy trends for various deep learning models. The CNN-LSTM-3 model has the best accuracy as the number of epochs increases.
  • ...and 3 more figures