Table of Contents
Fetching ...

The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning

Maria Rigaki, Sebastian Garcia

TL;DR

This work tackles automated evasion of static malware detectors under black-box access by introducing MEME, a model-based reinforcement learning framework that jointly performs malware evasion and model extraction to train high-fidelity surrogate models. MEME trains a PPO-based policy against an iteratively refined surrogate built from limited target queries and auxiliary data, achieving evasion rates of $32\%$ to $73\%$ and surrogate label agreement of $97\%-99\%$ with only $2,048$ queries. Compared with Random, PPO, MAB, and GAMMA, MEME generally achieves superior evasion performance across four targets, while maintaining efficient surrogate learning and offering practical implications for robustness testing of detectors. The results highlight MEME’s potential for evaluating defenses and guiding the design of more resistant malware detection pipelines, with future work including ensemble surrogates and broader target coverage to further enhance performance and applicability.

Abstract

Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection tool-chain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasion of antivirus systems, and defenders try to gain insight into their methods. This work proposes a new algorithm that combines Malware Evasion and Model Extraction (MEME) attacks. MEME uses model-based reinforcement learning to adversarially modify Windows executable binary samples while simultaneously training a surrogate model with a high agreement with the target model to evade. To evaluate this method, we compare it with two state-of-the-art attacks in adversarial malware creation, using three well-known published models and one antivirus product as targets. Results show that MEME outperforms the state-of-the-art methods in terms of evasion capabilities in almost all cases, producing evasive malware with an evasion rate in the range of 32-73%. It also produces surrogate models with a prediction label agreement with the respective target models between 97-99%. The surrogate could be used to fine-tune and improve the evasion rate in the future.

The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning

TL;DR

This work tackles automated evasion of static malware detectors under black-box access by introducing MEME, a model-based reinforcement learning framework that jointly performs malware evasion and model extraction to train high-fidelity surrogate models. MEME trains a PPO-based policy against an iteratively refined surrogate built from limited target queries and auxiliary data, achieving evasion rates of to and surrogate label agreement of with only queries. Compared with Random, PPO, MAB, and GAMMA, MEME generally achieves superior evasion performance across four targets, while maintaining efficient surrogate learning and offering practical implications for robustness testing of detectors. The results highlight MEME’s potential for evaluating defenses and guiding the design of more resistant malware detection pipelines, with future work including ensemble surrogates and broader target coverage to further enhance performance and applicability.

Abstract

Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection tool-chain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasion of antivirus systems, and defenders try to gain insight into their methods. This work proposes a new algorithm that combines Malware Evasion and Model Extraction (MEME) attacks. MEME uses model-based reinforcement learning to adversarially modify Windows executable binary samples while simultaneously training a surrogate model with a high agreement with the target model to evade. To evaluate this method, we compare it with two state-of-the-art attacks in adversarial malware creation, using three well-known published models and one antivirus product as targets. Results show that MEME outperforms the state-of-the-art methods in terms of evasion capabilities in almost all cases, producing evasive malware with an evasion rate in the range of 32-73%. It also produces surrogate models with a prediction label agreement with the respective target models between 97-99%. The surrogate could be used to fine-tune and improve the evasion rate in the future.
Paper Structure (25 sections, 2 equations, 3 figures, 4 tables)

This paper contains 25 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Malware-Gym Architecture
  • Figure 2: High-level description of the MEME algorithm. First, there is an initial training that produces a first $D_{sur}$, which is combined with $D_{aux}$ to train a surrogate, which is used to improve the agent. The loop is repeated k times.
  • Figure 3: Mean evasion rates and standard deviations of all methods tested on all targets.