Obfuscated Malware Detection: Investigating Real-world Scenarios through Memory Analysis
S M Rakib Hasan, Aakar Dhakal
TL;DR
The paper tackles obfuscated malware that evades traditional detectors by leveraging memory dump analysis and multiclass machine learning on the CIC-MalMem-2022 dataset. It evaluates Random Forest, MLP, KNN, and XGBoost for both binary detection and malware-family classification, incorporating undersampling and ADASYN-based oversampling to address class imbalance. Binary detection achieves near-perfect accuracy, while multiclass detection benefits most from ADASYN oversampling, with XGBoost consistently delivering top performance. The work contributes open-source code and demonstrates a practical memory-based approach to strengthen cybersecurity against evolving obfuscated threats.
Abstract
In the era of the internet and smart devices, the detection of malware has become crucial for system security. Malware authors increasingly employ obfuscation techniques to evade advanced security solutions, making it challenging to detect and eliminate threats. Obfuscated malware, adept at hiding itself, poses a significant risk to various platforms, including computers, mobile devices, and IoT devices. Conventional methods like heuristic-based or signature-based systems struggle against this type of malware, as it leaves no discernible traces on the system. In this research, we propose a simple and cost-effective obfuscated malware detection system through memory dump analysis, utilizing diverse machine-learning algorithms. The study focuses on the CIC-MalMem-2022 dataset, designed to simulate real-world scenarios and assess memory-based obfuscated malware detection. We evaluate the effectiveness of machine learning algorithms, such as decision trees, ensemble methods, and neural networks, in detecting obfuscated malware within memory dumps. Our analysis spans multiple malware categories, providing insights into algorithmic strengths and limitations. By offering a comprehensive assessment of machine learning algorithms for obfuscated malware detection through memory analysis, this paper contributes to ongoing efforts to enhance cybersecurity and fortify digital ecosystems against evolving and sophisticated malware threats. The source code is made open-access for reproducibility and future research endeavours. It can be accessed at https://bit.ly/MalMemCode.
