EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls

Pascal Maniriho; Abdun Naser Mahmood; Mohammad Jabed Morshed Chowdhury

EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls

Pascal Maniriho, Abdun Naser Mahmood, Mohammad Jabed Morshed Chowdhury

TL;DR

This paper presents EarlyMalDetect, a preventive Windows malware detection framework that first predicts upcoming API calls using a fine-tuned GPT-2 model and then classifies the extended API-call sequence with a DistilBERT-derived embedding fed into a BiGRU-Attention classifier. By enabling next-API-call prediction from an initial short sequence, the approach aims to detect malware at the very start of execution, potentially stopping threats before payload delivery. Empirical results on two API-call datasets show strong accuracy and AUC, improving as longer predicted sequences are used, and outperforming several DistilBERT-based baselines. The work advances early threat detection using transfer learning and attention-guided sequence modeling, with practical implications for mitigating zero-day Windows malware.

Abstract

In this work, we propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls. Our approach leverages generative transformer models and attention-guided deep recurrent neural networks to accurately identify and detect patterns of malicious behaviors in the early stage of malware execution. By analyzing the sequences of API calls invoked during execution, the proposed approach can classify executable files (programs) as malware or benign by predicting their behaviors based on a few shots (initial API calls) invoked during execution. EarlyMalDetect can predict and reveal what a malware program is going to perform on the target system before it occurs, which can help to stop it before executing its malicious payload and infecting the system. Specifically, EarlyMalDetect relies on a fine-tuned transformer model based on API calls which has the potential to predict the next API call functions to be used by a malware or benign executable program. Our extensive experimental evaluations show that the proposed approach is highly effective in predicting malware behaviors and can be used as a preventive measure against zero-day threats in Windows systems.

EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 6 figures, 15 tables, 2 algorithms)

This paper contains 21 sections, 3 equations, 6 figures, 15 tables, 2 algorithms.

Introduction
Background
Sequence Predictions
Sequence Classification
Related Work
The Proposed Malware Detection Approach
Collecting Datasets of API Call Sequences
Fine-tuning the GPT-2 Transformer Model on API Calls Dataset Through Transfer Learning
Designing the Detection Model
API call Sequence Numerical Representation
Bidirectional GRU Layer
Attention Layer
Fully Connected Layer
Testing the Trained Model
Experimental Evaluations and Results
...and 6 more sections

Figures (6)

Figure 1: Modeling sequences (a) Sequence prediction (b) Sequence classification.
Figure 2: The process for fine-tuning the GPT-2 model on a dataset of API call sequences through transfer learning.
Figure 3: The proposed approach for early malware detection based on the fine-tuned transformer model.
Figure 4: The process for predicting the next API calls given the initial sequence invoked by an executable file during execution.
Figure 5: TNR, FNR, TPR, and FPR achieved by the proposed early malware detection approach on dataset 1.
...and 1 more figures

EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls

TL;DR

Abstract

EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls

Authors

TL;DR

Abstract

Table of Contents

Figures (6)