Table of Contents
Fetching ...

Inverting Gradient Attacks Makes Powerful Data Poisoning

Wassim Bouaziz, El-Mahdi El-Mhamdi, Nicolas Usunier

TL;DR

This paper demonstrates that data poisoning, when coupled with gradient inversion, can replicate the damaging effects of gradient attacks in non-convex neural networks, achieving availability attacks with surprisingly small poison fractions. By reconstructing poisoning data from malicious gradients, the authors show that data poisoning can degrade model performance to random levels under realistic training settings and defenses. They provide extensive experiments on CIFAR-10 with CNN and ViT architectures, comparing gradient attacks and data poisoning under common update rules and aggregators, and reveal nuanced interactions with defenses like MultiKrum. The work highlights a critical vulnerability: gradient-based threats and data poisoning can be more closely related than previously thought, underscoring the need for stronger defenses against both attack vectors in non-convex settings.

Abstract

Gradient attacks and data poisoning tamper with the training of machine learning algorithms to maliciously alter them and have been proven to be equivalent in convex settings. The extent of harm these attacks can produce in non-convex settings is still to be determined. Gradient attacks can affect far less systems than data poisoning but have been argued to be more harmful since they can be arbitrary, whereas data poisoning reduces the attacker's power to only being able to inject data points to training sets, via e.g. legitimate participation in a collaborative dataset. This raises the question of whether the harm made by gradient attacks can be matched by data poisoning in non-convex settings. In this work, we provide a positive answer in a worst-case scenario and show how data poisoning can mimic a gradient attack to perform an availability attack on (non-convex) neural networks. Through gradient inversion, commonly used to reconstruct data points from actual gradients, we show how reconstructing data points out of malicious gradients can be sufficient to perform a range of attacks. This allows us to show, for the first time, an availability attack on neural networks through data poisoning, that degrades the model's performances to random-level through a minority (as low as 1%) of poisoned points.

Inverting Gradient Attacks Makes Powerful Data Poisoning

TL;DR

This paper demonstrates that data poisoning, when coupled with gradient inversion, can replicate the damaging effects of gradient attacks in non-convex neural networks, achieving availability attacks with surprisingly small poison fractions. By reconstructing poisoning data from malicious gradients, the authors show that data poisoning can degrade model performance to random levels under realistic training settings and defenses. They provide extensive experiments on CIFAR-10 with CNN and ViT architectures, comparing gradient attacks and data poisoning under common update rules and aggregators, and reveal nuanced interactions with defenses like MultiKrum. The work highlights a critical vulnerability: gradient-based threats and data poisoning can be more closely related than previously thought, underscoring the need for stronger defenses against both attack vectors in non-convex settings.

Abstract

Gradient attacks and data poisoning tamper with the training of machine learning algorithms to maliciously alter them and have been proven to be equivalent in convex settings. The extent of harm these attacks can produce in non-convex settings is still to be determined. Gradient attacks can affect far less systems than data poisoning but have been argued to be more harmful since they can be arbitrary, whereas data poisoning reduces the attacker's power to only being able to inject data points to training sets, via e.g. legitimate participation in a collaborative dataset. This raises the question of whether the harm made by gradient attacks can be matched by data poisoning in non-convex settings. In this work, we provide a positive answer in a worst-case scenario and show how data poisoning can mimic a gradient attack to perform an availability attack on (non-convex) neural networks. Through gradient inversion, commonly used to reconstruct data points from actual gradients, we show how reconstructing data points out of malicious gradients can be sufficient to perform a range of attacks. This allows us to show, for the first time, an availability attack on neural networks through data poisoning, that degrades the model's performances to random-level through a minority (as low as 1%) of poisoned points.

Paper Structure

This paper contains 31 sections, 4 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Territory of known availability attacks (in red) within a domain of constraints. The closer to the origin, the more constrained is the setting for the attacker and the harder it is to realize an availability attack. $\spadesuit$: geiping2021witcheszhao_clpa_2022ning2021invisiblehuang2020metapoison, $\heartsuit$: blanchard2017byzantinetolerantbaruch2019little, $\clubsuit$: mhamdi2018hidden, $\diamondsuit$ so far only in convex settings : farhadkhani2022equivalence, A: Result A in subsection \ref{['subsec:results']}, B: Result B in subsection \ref{['subsec:results']}.
  • Figure 2: Images of the gradient operator on different sets. $\mathbb{R}^d$ is where an attacker can craft unrestricted gradient attacks. $\nabla_{\theta} L(h_{\theta}(\mathcal{X}), \mathcal{Y})$ is the set of possible gradients given an unrestricted data poisoning (Result B in subsection \ref{['subsec:results']}), and $\nabla_{\theta} L(h_{\theta}(\mathcal{F}_{\mathcal{X}}), \mathcal{F}_{\mathcal{Y}})$ is the set of possible gradients when data poisoning is restricted to a feasible set $\mathcal{F}_{\mathcal{X}} \times \mathcal{F}_{\mathcal{Y}} \subseteq \mathcal{X} \times \mathcal{Y}$ (Result A in subsection \ref{['subsec:results']}).
  • Figure 3: Threat model. The attacker has access to $\theta$ but does not have access to the batch $S^{b}_{t}$ and uses an auxiliary dataset $D_{a}$ to craft $S^{p}_{t}$ the set of poisoned messages. Both the batch and the poisons set are gathered into $S^{b \cup p}_{t}$. The attacker's goal is either to slow down the training or attack the model's availability.
  • Figure 4: Landscape of accuracies after 1 poisoned step with the respective poison with two levels of contamination $\alpha= 0.005$ (left) and $\alpha=0.1$ (center). The black squares represent the border of the feasible domain $\mathcal{F}$. Right: The minimum accuracy in the landscape for different levels of contamination $\alpha$.
  • Figure 5: Validation accuracies during training in the $\textsc{SGD}$ & $\textsc{Average}$ setting under different attacks and different level $\alpha$ of contamination. Error bars represent the standard error.
  • ...and 6 more figures