On the Effectiveness of Adversarial Training on Malware Classifiers

Hamid Bostani; Jacopo Cortellazzi; Daniel Arp; Fabio Pierazzi; Veelasha Moonsamy; Lorenzo Cavallaro

On the Effectiveness of Adversarial Training on Malware Classifiers

Hamid Bostani, Jacopo Cortellazzi, Daniel Arp, Fabio Pierazzi, Veelasha Moonsamy, Lorenzo Cavallaro

TL;DR

This paper tackles a central question: how effective is adversarial training for malware classifiers in real-world, discrete feature spaces? It introduces Rubik, a unified, multidimensional evaluation framework that jointly analyzes data, feature representations, classifier types, and robust optimization settings, applied to Android malware with static representations such as DREBIN and RAMDA. Through systematic experiments across datasets, attacks (realistic and unrealistic), and domain constraints, the study reveals that AT’s benefits are conditional on model architecture, feature-space structure, and the realism of adversarial examples, challenging prior assumptions about universal gains from realizable or high-confidence AEs. The findings offer practical recommendations to balance clean accuracy and robustness, underscore the importance of domain-aware evaluation, and stress that robust malware detectors require carefully aligned end-to-end configurations rather than one-size-fits-all defenses.

Abstract

Adversarial Training (AT) is a key defense against Machine Learning evasion attacks, but its effectiveness for real-world malware detection remains poorly understood. This uncertainty stems from a critical disconnect in prior research: studies often overlook the inherent nature of malware and are fragmented, examining diverse variables like realism or confidence of adversarial examples in isolation, or relying on weak evaluations that yield non-generalizable insights. To address this, we introduce Rubik, a framework for the systematic, multi-dimensional evaluation of AT in the malware domain. This framework defines diverse key factors across essential dimensions, including data, feature representations, classifiers, and robust optimization settings, for a comprehensive exploration of the interplay of influential AT's variables through reliable evaluation practices, such as realistic evasion attacks. We instantiate Rubik on Android malware, empirically analyzing how this interplay shapes robustness. Our findings challenge prior beliefs--showing, for instance, that realizable adversarial examples offer only conditional robustness benefits--and reveal new insights, such as the critical role of model architecture and feature-space structure in determining AT's success. From this analysis, we distill four key insights, expose four common evaluation misconceptions, and offer practical recommendations to guide the development of truly robust malware classifiers.

On the Effectiveness of Adversarial Training on Malware Classifiers

TL;DR

Abstract

On the Effectiveness of Adversarial Training on Malware Classifiers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)