Table of Contents
Fetching ...

Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks

Fereshteh Razmi, Li Xiong

TL;DR

CAE can detect all forms of poisoning attacks using a combination of reconstruction and classification errors without having any prior knowledge of the attack strategy, and it is shown that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model.

Abstract

Poisoning attacks are a category of adversarial machine learning threats in which an adversary attempts to subvert the outcome of the machine learning systems by injecting crafted data into training data set, thus increasing the machine learning model's test error. The adversary can tamper with the data feature space, data labels, or both, each leading to a different attack strategy with different strengths. Various detection approaches have recently emerged, each focusing on one attack strategy. The Achilles heel of many of these detection approaches is their dependence on having access to a clean, untampered data set. In this paper, we propose CAE, a Classification Auto-Encoder based detector against diverse poisoned data. CAE can detect all forms of poisoning attacks using a combination of reconstruction and classification errors without having any prior knowledge of the attack strategy. We show that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model. Our experimental results on three real datasets MNIST, Fashion-MNIST and CIFAR demonstrate that our proposed method can maintain its functionality under up to 30% contaminated data and help the defended SVM classifier to regain its best accuracy.

Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks

TL;DR

CAE can detect all forms of poisoning attacks using a combination of reconstruction and classification errors without having any prior knowledge of the attack strategy, and it is shown that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model.

Abstract

Poisoning attacks are a category of adversarial machine learning threats in which an adversary attempts to subvert the outcome of the machine learning systems by injecting crafted data into training data set, thus increasing the machine learning model's test error. The adversary can tamper with the data feature space, data labels, or both, each leading to a different attack strategy with different strengths. Various detection approaches have recently emerged, each focusing on one attack strategy. The Achilles heel of many of these detection approaches is their dependence on having access to a clean, untampered data set. In this paper, we propose CAE, a Classification Auto-Encoder based detector against diverse poisoned data. CAE can detect all forms of poisoning attacks using a combination of reconstruction and classification errors without having any prior knowledge of the attack strategy. We show that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model. Our experimental results on three real datasets MNIST, Fashion-MNIST and CIFAR demonstrate that our proposed method can maintain its functionality under up to 30% contaminated data and help the defended SVM classifier to regain its best accuracy.

Paper Structure

This paper contains 12 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: Auto-encoders Structure: (a) The structure of Classification Auto-encoder (CAE). If trained on pure clean dataset it provides a high success defense against all poisoning attacks. (b) The structure of CAE+. Both Reconstruction Auto-encoder (RAE) and Classification Auto-encoder (CAE) work together to combat against poisons. This joint structure makes the defense method more robust even if trained on a contaminated dataset.
  • Figure 2: The effect of different attack types on the reconstruction error and auxiliary classification loss for poisoned MNIST-4-0 dataset. Triangles and circles represent clean and poisoned points, respectively. The poisons' size represents their impact on degrading the SVM accuracy (larger circles indicate higher impact).
  • Figure 3: Changes on MNIST-4-0 F1-score over different thresholds for CAE+ and OD. Thresholds are guesses on the probable number of poisoned data within the training dataset.
  • Figure 4: CAE+ F1-score for different values of $\alpha$ (Equation \ref{['eq:error_cae']}).
  • Figure 5: Ablation study between CAE+, CAE and RAE on MNIST 4-0
  • ...and 2 more figures