Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

Baoyuan Wu; Zihao Zhu; Li Liu; Qingshan Liu; Zhaofeng He; Siwei Lyu

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

Baoyuan Wu, Zihao Zhu, Li Liu, Qingshan Liu, Zhaofeng He, Siwei Lyu

TL;DR

This work delivers a unified, life-cycle–oriented survey of adversarial machine learning (AML) by introducing a general AML formulation based on stealthiness, benign consistency, and adversarial inconsistency, and then organizing backdoor, weight, and adversarial-example attacks within this framework. It provides a comprehensive taxonomy for attacks across five ML life-cycle stages, detailing how triggers, data, and training procedures are designed (pre-training and training stages), or how parameters and inputs are manipulated at deployment and inference time. The paper highlights connections among attack paradigms, reviews extensions to diffusion models and large language models, and discusses applications (e.g., copyright protection, privacy) and defense considerations, while offering a public, continuously updated taxonomy at adversarial-ml.com. Overall, it aims to harmonize disparate AML research streams, illuminate cross-paradigm vulnerabilities, and guide robust, defense-aware development of future ML systems.

Abstract

Adversarial machine learning (AML) studies the adversarial phenomenon of machine learning, which may make inconsistent or unexpected predictions with humans. Some paradigms have been recently developed to explore this adversarial phenomenon occurring at different stages of a machine learning system, such as backdoor attack occurring at the pre-training, in-training and inference stage; weight attack occurring at the post-training, deployment and inference stage; adversarial attack occurring at the inference stage. However, although these adversarial paradigms share a common goal, their developments are almost independent, and there is still no big picture of AML. In this work, we aim to provide a unified perspective to the AML community to systematically review the overall progress of this field. We firstly provide a general definition about AML, and then propose a unified mathematical framework to covering existing attack paradigms. According to the proposed unified framework, we build a full taxonomy to systematically categorize and review existing representative methods for each paradigm. Besides, using this unified framework, it is easy to figure out the connections and differences among different attack paradigms, which may inspire future researchers to develop more advanced attack paradigms. Finally, to facilitate the viewing of the built taxonomy and the related literature in adversarial machine learning, we further provide a website, \ie, \url{http://adversarial-ml.com}, where the taxonomies and literature will be continuously updated.

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

TL;DR

Abstract

Paper Structure (122 sections, 24 equations, 4 figures, 10 tables)

This paper contains 122 sections, 24 equations, 4 figures, 10 tables.

Introduction
What is Adversarial Machine Learning
General Definition and Formulation
Three Attack Paradigms at Different Stages of AML
Attack at the Pre-training Stage
Formulation and Categorization
Trigger Generation
Visible v.s. Invisible Trigger
Visible trigger
Invisible trigger
Non-semantic v.s. Semantic Trigger
Non-semantic trigger
Semantic trigger
Manually designed Trigger v.s. Learnable Trigger
Manually designed trigger
...and 107 more sections

Figures (4)

Figure 1: The full life-cycle of Adversarial Machine Leaning
Figure 2: Taxonomy of backdoor attacks at the pre-training, training, and inference stage.
Figure 3: Taxonomy of inference-time adversarial examples.
Figure 4: A brief graphical illustration of three attack paradigms of AML. (1) A binary classification task. (2) Backdoor attack: a backdoored model $f_{\mathbf{w}_{\varepsilon}}(\cdot)$ is trained based on the manipulated training dataset. (3) Weight attack: locally modifying the decision boundary of the benign model $f_{\mathbf{w}_0}(\cdot)$ to change the prediction of the target benign sample. (4) Adversarial example: a benign sample is perturbed to across the decision boundary of the benign model $f_{\mathbf{w}_0}(\cdot)$.

Theorems & Definitions (1)

Definition 1: Adversarial Machine Learning (AML)

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

TL;DR

Abstract

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (1)