Table of Contents
Fetching ...

Intriguing Properties of Adversarial ML Attacks in the Problem Space [Extended Version]

Jacopo Cortellazzi, Feargus Pendlebury, Daniel Arp, Erwin Quiring, Fabio Pierazzi, Lorenzo Cavallaro

TL;DR

This work formulates a principled framework for problem-space adversarial ML, addressing the inverse feature-mapping challenge by introducing transformation-based problem-space attacks and side-effect features. It proves necessary and sufficient conditions for the existence of such attacks and classifies search strategies as problem-driven, feature-driven, or hybrid. Building on this, the authors introduce a novel Android malware problem-space attack using automated software transplantation, demonstrating evasion of a state-of-the-art classifier (DREBIN) and its hardened variant (Sec-SVM) on a dataset of approximately $1.5\times 10^5$ apps, with generation times of a few minutes per instance. They further examine defenses, finding that adversarial training in the problem space markedly improves robustness, whereas adversarial retraining often fails, thereby highlighting the practical impact of realistic, domain-specific adversarial samples. Together, these contributions provide a rigorous, domain-transferable framework for comparing attacks and guiding robust defense design in real-world AI security systems.

Abstract

Recent research efforts on adversarial machine learning (ML) have investigated problem-space attacks, focusing on the generation of real evasive objects in domains where, unlike images, there is no clear inverse mapping to the feature space (e.g., software). However, the design, comparison, and real-world implications of problem-space attacks remain underexplored. This article makes three major contributions. Firstly, we propose a general formalization for adversarial ML evasion attacks in the problem-space, which includes the definition of a comprehensive set of constraints on available transformations, preserved semantics, absent artifacts, and plausibility. We shed light on the relationship between feature space and problem space, and we introduce the concept of side-effect features as the by-product of the inverse feature-mapping problem. This enables us to define and prove necessary and sufficient conditions for the existence of problem-space attacks. Secondly, building on our general formalization, we propose a novel problem-space attack on Android malware that overcomes past limitations in terms of semantics and artifacts. We have tested our approach on a dataset with 150K Android apps from 2016 and 2018 which show the practical feasibility of evading a state-of-the-art malware classifier along with its hardened version. Thirdly, we explore the effectiveness of adversarial training as a possible approach to enforce robustness against adversarial samples, evaluating its effectiveness on the considered machine learning models under different scenarios. Our results demonstrate that "adversarial-malware as a service" is a realistic threat, as we automatically generate thousands of realistic and inconspicuous adversarial applications at scale, where on average it takes only a few minutes to generate an adversarial instance.

Intriguing Properties of Adversarial ML Attacks in the Problem Space [Extended Version]

TL;DR

This work formulates a principled framework for problem-space adversarial ML, addressing the inverse feature-mapping challenge by introducing transformation-based problem-space attacks and side-effect features. It proves necessary and sufficient conditions for the existence of such attacks and classifies search strategies as problem-driven, feature-driven, or hybrid. Building on this, the authors introduce a novel Android malware problem-space attack using automated software transplantation, demonstrating evasion of a state-of-the-art classifier (DREBIN) and its hardened variant (Sec-SVM) on a dataset of approximately apps, with generation times of a few minutes per instance. They further examine defenses, finding that adversarial training in the problem space markedly improves robustness, whereas adversarial retraining often fails, thereby highlighting the practical impact of realistic, domain-specific adversarial samples. Together, these contributions provide a rigorous, domain-transferable framework for comparing attacks and guiding robust defense design in real-world AI security systems.

Abstract

Recent research efforts on adversarial machine learning (ML) have investigated problem-space attacks, focusing on the generation of real evasive objects in domains where, unlike images, there is no clear inverse mapping to the feature space (e.g., software). However, the design, comparison, and real-world implications of problem-space attacks remain underexplored. This article makes three major contributions. Firstly, we propose a general formalization for adversarial ML evasion attacks in the problem-space, which includes the definition of a comprehensive set of constraints on available transformations, preserved semantics, absent artifacts, and plausibility. We shed light on the relationship between feature space and problem space, and we introduce the concept of side-effect features as the by-product of the inverse feature-mapping problem. This enables us to define and prove necessary and sufficient conditions for the existence of problem-space attacks. Secondly, building on our general formalization, we propose a novel problem-space attack on Android malware that overcomes past limitations in terms of semantics and artifacts. We have tested our approach on a dataset with 150K Android apps from 2016 and 2018 which show the practical feasibility of evading a state-of-the-art malware classifier along with its hardened version. Thirdly, we explore the effectiveness of adversarial training as a possible approach to enforce robustness against adversarial samples, evaluating its effectiveness on the considered machine learning models under different scenarios. Our results demonstrate that "adversarial-malware as a service" is a realistic threat, as we automatically generate thousands of realistic and inconspicuous adversarial applications at scale, where on average it takes only a few minutes to generate an adversarial instance.

Paper Structure

This paper contains 31 sections, 3 theorems, 10 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Given a problem-space object $z \in \mathcal{Z}$ of class $y \in \mathcal{Y}$, with features $\varphi(z)=\bm{x}$, and a target class $t \in \mathcal{Y}$, $t \neq y$, there exists a transformation sequence $\text{\bf T}$ that causes $\text{\bf T}(z)$ to be misclassified as $t$only if there is a solut

Figures (8)

  • Figure 1: Example of projection of the feature-space attack vector $\bm{x}+\bm{\delta}^*$ in the feasible problem space, resulting in side-effect features $\bm{\eta}$. The background displays the value of the discriminant function $h(\bm{x})$, where negative values indicate the target class of the evasion attack. Small arrows represent directions of the negative gradient. The thick solid line represents the feasible feature space determined by $\Omega$, and the thin solid line that determined by $\Gamma$ (which is more restrictive). The dotted arrow represents the gradient-based attack $\bm{x}+\bm{\delta}^*$ derived from $\bm{x}$, which is then projected into $\bm{x}+\bm{\delta}^*+\bm{\eta}$ to fit into the feasible problem space.
  • Figure 2: Performance of SVM and Sec-SVM in absence of adversarial attacks.
  • Figure 3: Statistics of the evasive malware variants, compared with statistics of benign apps. The dark gray background highlights the area between first and third quartile of benign applications; the light gray background is based on the 3$\sigma$ rule and highlights values benign statistics between $0.15\%$ and $99.85\%$ of the distribution (i.e., spanning $99.7\%$ of the distribution).
  • Figure 4: Breakdown of average number of features injected for each considered classifier.
  • Figure 5: Violin plots of times per adversarial app.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Definition 1: Feature Mapping
  • Definition 2: Discriminant Function
  • Definition 3: Attack Objective Function
  • Definition 4: Feature-Space Constraints
  • Definition 5: Feature-Space Attack
  • Definition 6: Problem-Space Transformation
  • Definition 7: Transformation Sequence
  • Definition 8: Available Transformations
  • Definition 9: Preserved Semantics
  • Definition 10: Plausibility
  • ...and 7 more