Updating Windows Malware Detectors: Balancing Robustness and Regression against Adversarial EXEmples
Matous Kozak, Luca Demetrio, Dmitrijs Trizna, Fabio Roli
TL;DR
This work addresses the challenge of updating Windows malware detectors to withstand adversarial EXEmples without incurring performance regression. It introduces EXE-scanner, a lightweight plugin that can be attached to any baseline detector to detect and block adversarial EXEmples while preserving prior predictions. Through extensive experiments, EXE-scanner matches or surpasses adversarial training in robustness and demonstrates minimal or no regression, even when paired with commercial AV engines; SHAP analysis reveals perturbation artifacts that these attacks leave behind. The authors also release a large adversarial EXEmples dataset and code to support reproducibility and further research in robust malware detection.
Abstract
Adversarial EXEmples are carefully-perturbed programs tailored to evade machine learning Windows malware detectors, with an ongoing effort to develop robust models able to address detection effectiveness. However, even if robust models can prevent the majority of EXEmples, to maintain predictive power over time, models are fine-tuned to newer threats, leading either to partial updates or time-consuming retraining from scratch. Thus, even if the robustness against adversarial EXEmples is higher, the new models might suffer a regression in performance by misclassifying threats that were previously correctly detected. For these reasons, we study the trade-off between accuracy and regression when updating Windows malware detectors by proposing EXE-scanner, a plugin that can be chained to existing detectors to promptly stop EXEmples without causing regression. We empirically show that previously proposed hardening techniques suffer a regression of accuracy when updating non-robust models, exacerbating the gap when considering low false positives regimes and temporal drifts affecting data. Also, through EXE-scanner we gain evidence on the detectability of adversarial EXEmples, showcasing the presence of artifacts left inside while creating them. Due to its design, EXE-scanner can be chained to any classifier to obtain the best performance without the need for costly retraining. To foster reproducibility, we openly release the source code, along with the dataset of adversarial EXEmples based on state-of-the-art perturbation algorithms.
