Zero Day Ransomware Detection with Pulse: Function Classification with Transformer Models and Assembly Language
Matthew Gaber, Mohiuddin Ahmed, Helge Janicke
TL;DR
Zero-day ransomware detection is addressed by combining dynamic behavior data captured by Peekaboo DBI with Transformer-based models applied to Assembly language. Pulse introduces a feature-engineering pipeline and a classifier that treats ASM instructions as tokens and functions as sentences, leveraging Zipf's law $f(r) \sim r^{a}$ to justify context-based learning. Experiments demonstrate state-of-the-art performance and strong generalization to unseen malicious functionality, with accuracies routinely exceeding 95% on balanced test sets and superior robustness to never-before-seen samples. The work also provides open-source Pulse models and tooling to facilitate adoption and replication in defensive cybersecurity workflows.
Abstract
Finding automated AI techniques to proactively defend against malware has become increasingly critical. The ability of an AI model to correctly classify novel malware is dependent on the quality of the features it is trained with and the authenticity of the features is dependent on the analysis tool. Peekaboo, a Dynamic Binary Instrumentation tool defeats evasive malware to capture its genuine behavior. The ransomware Assembly instructions captured by Peekaboo, follow Zipf's law, a principle also observed in natural languages, indicating Transformer models are particularly well suited to binary classification. We propose Pulse, a novel framework for zero day ransomware detection with Transformer models and Assembly language. Pulse, trained with the Peekaboo ransomware and benign software data, uniquely identify truly new samples with high accuracy. Pulse eliminates any familiar functionality across the test and training samples, forcing the Transformer model to detect malicious behavior based solely on context and novel Assembly instruction combinations.
