Table of Contents
Fetching ...

Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Antonio Emanuele Cinà, Kathrin Grosse, Sebastiano Vascon, Ambra Demontis, Battista Biggio, Fabio Roli, Marcello Pelillo

TL;DR

This paper introduces backdoor learning curves as a unifying framework to explain backdoor poisoning through incremental learning and influence functions. It identifies three key factors—model complexity controlled by hyperparameters, the fraction of poisoned samples, and the size/visibility of the backdoor trigger—that govern how quickly a model learns to associate a trigger with the attacker-chosen class. The authors define the backdoor learning slope, relate it to influence functions, and propose parameter-deviation metrics to study how learning backdoors alters model weights. A notable finding is the existence of a region in hyperparameter space where clean accuracy remains high while backdoor vulnerability is low, suggesting practical defenses via regularization and careful hyperparameter selection.

Abstract

Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.

Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

TL;DR

This paper introduces backdoor learning curves as a unifying framework to explain backdoor poisoning through incremental learning and influence functions. It identifies three key factors—model complexity controlled by hyperparameters, the fraction of poisoned samples, and the size/visibility of the backdoor trigger—that govern how quickly a model learns to associate a trigger with the attacker-chosen class. The authors define the backdoor learning slope, relate it to influence functions, and propose parameter-deviation metrics to study how learning backdoors alters model weights. A notable finding is the existence of a region in hyperparameter space where clean accuracy remains high while backdoor vulnerability is low, suggesting practical defenses via regularization and careful hyperparameter selection.

Abstract

Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.

Paper Structure

This paper contains 14 sections, 7 equations, 28 figures, 1 table.

Figures (28)

  • Figure 1: Backdoor learning curves. Considering an SVM with the RBF kernel ($\gamma = 10$) on a toy dataset in two dimensions, we show the influence of model complexity (controlled by the regularization hyperparameter $\lambda=\frac{1}{C}$) on backdoor learning. For both the strong (left) and weak (right) regularization settings, we report two plots. The left plot shows the two-dimensional data (dots) and decision surface for different values of $\beta$ (green lines). The right plot shows the backdoor learning curve, i.e. how the loss decreases as $\beta$ ranges from 0 to 1, which amounts to learning the backdoor samples. We plot both the loss on the clean test samples (orange dotted line) and on the test samples with the backdoor trigger (blue line). The slope of these curves represents the speed with which the model learns to classify the backdoor samples (black dots) as blue dots, unveiling that strong regularization slows down such a process.
  • Figure 2: Backdoor learning curves for: (top row) logistic classifier (LC) on MNIST 7 vs. 1 with $\lambda \in \{10, 0.01\}$ and trigger size $3 \times 3$(left) or $6 \times 6$(right); (middle row) Ridge classifier on CIFAR10 airplane vs frog with $\lambda \in \{100000, 100\}$ and trigger size $8\times 8$(left) or $16\times 16$(right); (bottom row) RBF SVM with $\gamma={1}\mathrm{e}{-04}$ on Imagenette tench vs truck with $\lambda \in \{10, 0.1\}$ and trigger visibility $c_m=10$(left) or $c_m=75$(right). Darker lines represent a higher fraction of poisoning samples $p$ injected into the training set. We report the loss on the clean test samples (TS) with a dashed line and on the test samples with the backdoor trigger (TS+BT) with a solid line.
  • Figure 3: Backdoor learning curves for Resnet18 trained on the full Imagenette training dataset with $10$ and $50$ epochs. Darker lines represent a higher fraction of poisoning samples $p$ injected into the training set. We report the loss on the clean test samples (TS) with a dashed line and on the test samples with the backdoor trigger (TS+BT) with a solid line.
  • Figure 4: Backdoor learning curves for MNIST $7~\rm{vs}~1$ (top row), CIFAR10 airplane vs frog (middle row) and Imagenette tench vs truck (bottom row) when changing the kernel parameter $\gamma$ on RBF SVM. Darker lines represent a higher fraction of poisoning samples $p$ injected into the training set. We report the loss on the clean test samples (TS) with a dashed line and on the test samples with the backdoor trigger (TS+BT) with a solid line.
  • Figure 5: Backdoor slope $\theta$ vs clean accuracy (red) and backdoor effectiveness (blue) on MNIST 7 vs. 1 with backdoor trigger size $3\times 3$ (top row) and $6 \times 6$ (bottom row). We measure the classification accuracy on the untainted test samples (TS), and on the same samples after injecting the backdoor trigger (TS+BT). We chose the $\gamma$ parameter for the RBF kernel as $\gamma_0={5}\mathrm{e}{-04}$ (orange triangle for clean data, light blue plus for data with trigger) and $\gamma_1={5}\mathrm{e}{-03}$ (red inverted triangle for clean data, dark blue x for data with trigger).
  • ...and 23 more figures