Table of Contents
Fetching ...

Generative Poisoning Attack Method Against Neural Networks

Chaofei Yang, Qing Wu, Hai Li, Yiran Chen

TL;DR

This work analyzes causative poisoning of neural networks during retraining by injecting poisoned data to maximize the loss on normal inputs, formalized as $\max L(\mathbf{x}_p)$ or $\max_{\mathbf{x}_p} \sum_i L_i^{(p)}$. It contrasts a traditional direct gradient method with a GAN-inspired generative approach using an autoencoder as a generator that receives gradients from the discriminator to produce poisoned samples, thereby bypassing costly second-derivative calculations. A loss-based countermeasure detects poisoning by monitoring per-input losses with simple thresholds, offering low overhead defense. Experiments on MNIST and CIFAR-10 show the generative method can achieve up to $239.38\times$ faster poisoned data generation with comparable attack effectiveness, highlighting security risks for continually retrained systems and informing practical defenses.

Abstract

Poisoning attack is identified as a severe security threat to machine learning algorithms. In many applications, for example, deep neural network (DNN) models collect public data as the inputs to perform re-training, where the input data can be poisoned. Although poisoning attack against support vector machines (SVM) has been extensively studied before, there is still very limited knowledge about how such attack can be implemented on neural networks (NN), especially DNNs. In this work, we first examine the possibility of applying traditional gradient-based method (named as the direct gradient method) to generate poisoned data against NNs by leveraging the gradient of the target model w.r.t. the normal data. We then propose a generative method to accelerate the generation rate of the poisoned data: an auto-encoder (generator) used to generate poisoned data is updated by a reward function of the loss, and the target NN model (discriminator) receives the poisoned data to calculate the loss w.r.t. the normal data. Our experiment results show that the generative method can speed up the poisoned data generation rate by up to 239.38x compared with the direct gradient method, with slightly lower model accuracy degradation. A countermeasure is also designed to detect such poisoning attack methods by checking the loss of the target model.

Generative Poisoning Attack Method Against Neural Networks

TL;DR

This work analyzes causative poisoning of neural networks during retraining by injecting poisoned data to maximize the loss on normal inputs, formalized as or . It contrasts a traditional direct gradient method with a GAN-inspired generative approach using an autoencoder as a generator that receives gradients from the discriminator to produce poisoned samples, thereby bypassing costly second-derivative calculations. A loss-based countermeasure detects poisoning by monitoring per-input losses with simple thresholds, offering low overhead defense. Experiments on MNIST and CIFAR-10 show the generative method can achieve up to faster poisoned data generation with comparable attack effectiveness, highlighting security risks for continually retrained systems and informing practical defenses.

Abstract

Poisoning attack is identified as a severe security threat to machine learning algorithms. In many applications, for example, deep neural network (DNN) models collect public data as the inputs to perform re-training, where the input data can be poisoned. Although poisoning attack against support vector machines (SVM) has been extensively studied before, there is still very limited knowledge about how such attack can be implemented on neural networks (NN), especially DNNs. In this work, we first examine the possibility of applying traditional gradient-based method (named as the direct gradient method) to generate poisoned data against NNs by leveraging the gradient of the target model w.r.t. the normal data. We then propose a generative method to accelerate the generation rate of the poisoned data: an auto-encoder (generator) used to generate poisoned data is updated by a reward function of the loss, and the target NN model (discriminator) receives the poisoned data to calculate the loss w.r.t. the normal data. Our experiment results show that the generative method can speed up the poisoned data generation rate by up to 239.38x compared with the direct gradient method, with slightly lower model accuracy degradation. A countermeasure is also designed to detect such poisoning attack methods by checking the loss of the target model.

Paper Structure

This paper contains 13 sections, 6 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: An overview of direct gradient method.
  • Figure 2: An overview of the generative method.
  • Figure 3: The process of poisoned data generation under different configurations.
  • Figure 4: The trend of the loss and accuracy of the direct gradient method under different group sizes for MNIST dataset.
  • Figure 5: The loss difference between normal and poisoned losses.