Deep Learning-Driven Malware Classification with API Call Sequence Analysis and Concept Drift Handling
Bishwajit Prasad Gond, Durga Prasad Mohapatra
TL;DR
This work tackles malware classification under concept drift by marrying deep learning with genetic algorithms to adapt to evolving threats. It leverages dynamic analysis API-call sequences (n-grams) from sandboxed PE samples and augments DL models with GA-driven feature mutation, selecting an expanded yet discriminative feature set. The approach yields strong CNN performance on drift-free data and robust results under drift, with improvements linked to mutated-feature ensembles and a 1% feature corpus augmentation (10{,}500 features from 101{,}248 mutants). The framework, demonstrated on VirusShare/VirusTotal-derived data and implemented in a public codebase, offers a scalable path to real-time, drift-resilient malware classification in dynamic cybersecurity environments.
Abstract
Malware classification in dynamic environments presents a significant challenge due to concept drift, where the statistical properties of malware data evolve over time, complicating detection efforts. To address this issue, we propose a deep learning framework enhanced with a genetic algorithm to improve malware classification accuracy and adaptability. Our approach incorporates mutation operations and fitness score evaluations within genetic algorithms to continuously refine the deep learning model, ensuring robustness against evolving malware threats. Experimental results demonstrate that this hybrid method significantly enhances classification performance and adaptability, outperforming traditional static models. Our proposed approach offers a promising solution for real-time malware classification in ever-changing cybersecurity landscapes.
