The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning
Maria Rigaki, Sebastian Garcia
TL;DR
This work tackles automated evasion of static malware detectors under black-box access by introducing MEME, a model-based reinforcement learning framework that jointly performs malware evasion and model extraction to train high-fidelity surrogate models. MEME trains a PPO-based policy against an iteratively refined surrogate built from limited target queries and auxiliary data, achieving evasion rates of $32\%$ to $73\%$ and surrogate label agreement of $97\%-99\%$ with only $2,048$ queries. Compared with Random, PPO, MAB, and GAMMA, MEME generally achieves superior evasion performance across four targets, while maintaining efficient surrogate learning and offering practical implications for robustness testing of detectors. The results highlight MEME’s potential for evaluating defenses and guiding the design of more resistant malware detection pipelines, with future work including ensemble surrogates and broader target coverage to further enhance performance and applicability.
Abstract
Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection tool-chain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasion of antivirus systems, and defenders try to gain insight into their methods. This work proposes a new algorithm that combines Malware Evasion and Model Extraction (MEME) attacks. MEME uses model-based reinforcement learning to adversarially modify Windows executable binary samples while simultaneously training a surrogate model with a high agreement with the target model to evade. To evaluate this method, we compare it with two state-of-the-art attacks in adversarial malware creation, using three well-known published models and one antivirus product as targets. Results show that MEME outperforms the state-of-the-art methods in terms of evasion capabilities in almost all cases, producing evasive malware with an evasion rate in the range of 32-73%. It also produces surrogate models with a prediction label agreement with the respective target models between 97-99%. The surrogate could be used to fine-tune and improve the evasion rate in the future.
