Table of Contents
Fetching ...

Towards the Science of Security and Privacy in Machine Learning

Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael Wellman

TL;DR

The paper tackles the fragmented understanding of security and privacy in machine learning by proposing a unified threat model that spans the entire ML data pipeline and aligns attacks/defenses with CIA and privacy principles. It formalizes ML learning through the PAC framework and analyzes training-time poisoning, inference-time adversaries (white-box and black-box), and privacy/ fairness considerations. Key contributions include a structured taxonomy of attacks and defenses, links between distribution drift and robustness, and a no free lunch theorem illustrating fundamental accuracy-resilience trade-offs. The work highlights the need to calibrate model complexity, data availability, and defense strategies to environment-specific risks, laying groundwork for robust, private, and accountable ML systems.

Abstract

Advances in machine learning (ML) in recent years have enabled a dizzying array of applications such as data analytics, autonomous systems, and security diagnostics. ML is now pervasive---new systems and models are being deployed in every domain imaginable, leading to rapid and widespread deployment of software based inference and decision making. There is growing recognition that ML exposes new vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited. We systematize recent findings on ML security and privacy, focusing on attacks identified on these systems and defenses crafted to date. We articulate a comprehensive threat model for ML, and categorize attacks and defenses within an adversarial framework. Key insights resulting from works both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. We conclude by formally exploring the opposing relationship between model accuracy and resilience to adversarial manipulation. Through these explorations, we show that there are (possibly unavoidable) tensions between model complexity, accuracy, and resilience that must be calibrated for the environments in which they will be used.

Towards the Science of Security and Privacy in Machine Learning

TL;DR

The paper tackles the fragmented understanding of security and privacy in machine learning by proposing a unified threat model that spans the entire ML data pipeline and aligns attacks/defenses with CIA and privacy principles. It formalizes ML learning through the PAC framework and analyzes training-time poisoning, inference-time adversaries (white-box and black-box), and privacy/ fairness considerations. Key contributions include a structured taxonomy of attacks and defenses, links between distribution drift and robustness, and a no free lunch theorem illustrating fundamental accuracy-resilience trade-offs. The work highlights the need to calibrate model complexity, data availability, and defense strategies to environment-specific risks, laying groundwork for robust, private, and accountable ML systems.

Abstract

Advances in machine learning (ML) in recent years have enabled a dizzying array of applications such as data analytics, autonomous systems, and security diagnostics. ML is now pervasive---new systems and models are being deployed in every domain imaginable, leading to rapid and widespread deployment of software based inference and decision making. There is growing recognition that ML exposes new vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited. We systematize recent findings on ML security and privacy, focusing on attacks identified on these systems and defenses crafted to date. We articulate a comprehensive threat model for ML, and categorize attacks and defenses within an adversarial framework. Key insights resulting from works both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. We conclude by formally exploring the opposing relationship between model accuracy and resilience to adversarial manipulation. Through these explorations, we show that there are (possibly unavoidable) tensions between model complexity, accuracy, and resilience that must be calibrated for the environments in which they will be used.

Paper Structure

This paper contains 22 sections, 2 theorems, 15 equations, 6 figures.

Key Result

Theorem 1

Fix any hypothesis (function) class $\mathcal{H}$ and distribution $D$, and assume that an attacker exists with probability $q$. Given the attacker uses an $\alpha$-effective attack against $\mathcal{H}$ and $D$ with $\alpha \geq \alpha_0 > 0$, for all hypothesis $h \in R(\mathcal{H})$ the learner's

Figures (6)

  • Figure 1: System's attack surface: the generic model (top row) is illustrated with two example scenarios (bottom rows): a computer vision model used by an automotive system to recognize traffic signs on the road and a network intrusion detection system.
  • Figure 2: Adversarial Capabilities: adversaries attack ML systems at inference time by exploiting model internal information (white box) or probing the system to infer system vulnerabilities (black box). Adversaries use read or write access to the training data to mimic or corrupt the model.
  • Figure 3: Attacks at inference: all of these works are discussed in Section \ref{['sec:inference']} and represent the threat models explored by the research community.
  • Figure 4: Evading infinitesimal defenses using transferability: the defended model is very smooth in neighborhoods of training points: i.e., gradients of the model outputs with respect to its inputs are zero and the adversary does not know in which direction to look for adversarial examples. However, the adversary can use the substitute model's gradients to find adversarial examples that transfer back to the defended model. Note that this effect would be exacerbated by models with more than one dimension.
  • Figure 5: Subfigure A shows the data available for learning and the true separator between the positive and negative region. The top left corner has few data points, which in the PAC model means that the data distribution $D$ has low probability mass over that region. Subfigure B shows the model learned with an hypothesis class $\mathcal{H}$ of linear classifiers. Subfigure C shows all points misclassified by the linear model. Also shown is an adversarially chosen uniform distribution $\overset{\boldsymbol .}{D}$ restricted to the red oval in the top left corner; two observations are (1) the red crosses will cause a significant prediction loss with $\overset{\boldsymbol .}{D}$ and the linear model shown, and (2) the true separator in this red oval is highly non-linear (compared to rest of the space) and hence even the best linear classifier learned w.r.t. $\overset{\boldsymbol .}{D}$ will suffer significant expected loss. Subfigure D shows that a more complex non-linear classifier can be more accurate and can provide a lower expected loss against $\overset{\boldsymbol .}{D}$ (modulo over-fitting issues).
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • proof
  • proof