A Sysmon Incremental Learning System for Ransomware Analysis and Detection
Jamil Ispahany, MD Rafiqul Islam, M. Arif Khan, MD Zahidul Islam
TL;DR
Ransomware detection faces a training-gap bottleneck as new strains emerge. The authors propose SILRAD, an online incremental learning system that leverages Sysmon logs, fastText embeddings, PCC-based feature selection, Adaptive Random Forest, and ADWIN concept-drift detection to continuously adapt without retraining from scratch. On drift-enabled data, SILRAD achieves near 98.9% accuracy and around 94% MCC, while using less memory and offering faster real-time inference than competing incremental methods. This work demonstrates a practical, drift-aware framework for real-time ransomware analytics that mitigates data exposure during model updates and supports deployment in production-like environments.
Abstract
In the face of increasing cyber threats, particularly ransomware attacks, there is a pressing need for advanced detection and analysis systems that adapt to evolving malware behaviours. Throughout the literature, using machine learning (ML) to obviate ransomware attacks has increased in popularity. Unfortunately, most of these proposals leverage non-incremental learning approaches that require the underlying models to be updated from scratch to detect new ransomware, wasting time and resources. This approach is problematic because it leaves sensitive data vulnerable to attack during retraining, as newly emerging ransomware strains may go undetected until the model is updated. Furthermore, most of these approaches are not designed to detect ransomware in real-time data streams, limiting their effectiveness in complex network environments. To address this challenge, we present the Sysmon Incremental Learning System for Ransomware Analysis and Detection (SILRAD), which enables continuous updates to the underlying model and effectively closes the training gap. By leveraging the capabilities of Sysmon for detailed monitoring of system activities, our approach integrates online incremental learning techniques to enhance the adaptability and efficiency of ransomware detection. The most valuable features for detection were selected using the Pearson Correlation Coefficient (PCC), and concept drift detection was implemented through the ADWIN algorithm, ensuring that the model remains responsive to changes in ransomware behaviour. We compared our results to other popular techniques, such as Hoeffding Trees (HT) and Leveraging Bagging Classifier (LB), observing a detection accuracy of 98.89% and a Matthews Correlation Coefficient (MCC) rate of 94.11%, demonstrating the effectiveness of our technique.
