Table of Contents
Fetching ...

Adversarial Machine Learning: Bayesian Perspectives

David Rios Insua, Roi Naveiro, Victor Gallego, Jason Poulos

TL;DR

The paper addresses adversarial vulnerabilities in ML and the inadequacy of common knowledge ($CK$) in CK-based AML models. It proposes a Bayesian Adversarial Risk Analysis ($ARA$) framework that explicitly models uncertainty about attacker beliefs and goals, integrating into supervised learning via $p(y|x)$ and $p(x'|x)$. The approach yields a robust posterior predictive mechanism ($RAPPD$) for operation and a robust adversarial posterior ($RAPD$) for training, with practical algorithms based on Approximate Bayesian Computation (ABC) and stochastic gradient methods. Case studies in spam detection and vision tasks demonstrate improved robustness, particularly when defender uncertainty about attacker behavior is incorporated.

Abstract

Adversarial Machine Learning (AML) is emerging as a major field aimed at protecting machine learning (ML) systems against security threats: in certain scenarios there may be adversaries that actively manipulate input data to fool learning systems. This creates a new class of security vulnerabilities that ML systems may face, and a new desirable property called adversarial robustness essential to trust operations based on ML outputs. Most work in AML is built upon a game-theoretic modelling of the conflict between a learning system and an adversary, ready to manipulate input data. This assumes that each agent knows their opponent's interests and uncertainty judgments, facilitating inferences based on Nash equilibria. However, such common knowledge assumption is not realistic in the security scenarios typical of AML. After reviewing such game-theoretic approaches, we discuss the benefits that Bayesian perspectives provide when defending ML-based systems. We demonstrate how the Bayesian approach allows us to explicitly model our uncertainty about the opponent's beliefs and interests, relaxing unrealistic assumptions, and providing more robust inferences. We illustrate this approach in supervised learning settings, and identify relevant future research problems.

Adversarial Machine Learning: Bayesian Perspectives

TL;DR

The paper addresses adversarial vulnerabilities in ML and the inadequacy of common knowledge () in CK-based AML models. It proposes a Bayesian Adversarial Risk Analysis () framework that explicitly models uncertainty about attacker beliefs and goals, integrating into supervised learning via and . The approach yields a robust posterior predictive mechanism () for operation and a robust adversarial posterior () for training, with practical algorithms based on Approximate Bayesian Computation (ABC) and stochastic gradient methods. Case studies in spam detection and vision tasks demonstrate improved robustness, particularly when defender uncertainty about attacker behavior is incorporated.

Abstract

Adversarial Machine Learning (AML) is emerging as a major field aimed at protecting machine learning (ML) systems against security threats: in certain scenarios there may be adversaries that actively manipulate input data to fool learning systems. This creates a new class of security vulnerabilities that ML systems may face, and a new desirable property called adversarial robustness essential to trust operations based on ML outputs. Most work in AML is built upon a game-theoretic modelling of the conflict between a learning system and an adversary, ready to manipulate input data. This assumes that each agent knows their opponent's interests and uncertainty judgments, facilitating inferences based on Nash equilibria. However, such common knowledge assumption is not realistic in the security scenarios typical of AML. After reviewing such game-theoretic approaches, we discuss the benefits that Bayesian perspectives provide when defending ML-based systems. We demonstrate how the Bayesian approach allows us to explicitly model our uncertainty about the opponent's beliefs and interests, relaxing unrealistic assumptions, and providing more robust inferences. We illustrate this approach in supervised learning settings, and identify relevant future research problems.

Paper Structure

This paper contains 12 sections, 10 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Original input and attacked version.
  • Figure 2: Robustness of a deep network against the PGD attack under three defense mechanisms (NONE, AT, ARA). (a) depicts the security evaluation curves for the attacked Fashion-M. dataset. (b) depicts the respective curves for the attacked Kuzushiji-M. dataset.