Table of Contents
Fetching ...

Generalization Bound and New Algorithm for Clean-Label Backdoor Attack

Lijia Yu, Shuang Liu, Yibo Miao, Xiao-Shan Gao, Lijun Zhang

TL;DR

This work derives algorithm-independent generalization bounds for clean-label backdoor attacks, addressing the non-i.i.d. nature of poisoned training data and establishing bounds for both clean and poison population errors in terms of empirical poison error. Building on these bounds, the authors propose a new clean-label backdoor attack that combines adversarial noise and indiscriminate poison as a trigger, designed to satisfy three key conditions that bound the poison-generalization error. The method is validated across CIFAR-10, CIFAR-100, SVHN, and TinyImageNet, achieving high attack success rates with modest poison budgets and demonstrating some resilience against common defenses, while also providing a theoretical framework to analyze such attacks. Overall, the paper provides a principled link between generalization theory and backdoor poisoning, offering a trigger design strategy with tangible attack efficacy and insights for defense research.

Abstract

The generalization bound is a crucial theoretical tool for assessing the generalizability of learning methods and there exist vast literatures on generalizability of normal learning, adversarial learning, and data poisoning. Unlike other data poison attacks, the backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set and the purpose of the attack is two-fold. To our knowledge, the generalization bound for the backdoor attack has not been established. In this paper, we fill this gap by deriving algorithm-independent generalization bounds in the clean-label backdoor attack scenario. Precisely, based on the goals of backdoor attack, we give upper bounds for the clean sample population errors and the poison population errors in terms of the empirical error on the poisoned training dataset. Furthermore, based on the theoretical result, a new clean-label backdoor attack is proposed that computes the poisoning trigger by combining adversarial noise and indiscriminate poison. We show its effectiveness in a variety of settings.

Generalization Bound and New Algorithm for Clean-Label Backdoor Attack

TL;DR

This work derives algorithm-independent generalization bounds for clean-label backdoor attacks, addressing the non-i.i.d. nature of poisoned training data and establishing bounds for both clean and poison population errors in terms of empirical poison error. Building on these bounds, the authors propose a new clean-label backdoor attack that combines adversarial noise and indiscriminate poison as a trigger, designed to satisfy three key conditions that bound the poison-generalization error. The method is validated across CIFAR-10, CIFAR-100, SVHN, and TinyImageNet, achieving high attack success rates with modest poison budgets and demonstrating some resilience against common defenses, while also providing a theoretical framework to analyze such attacks. Overall, the paper provides a principled link between generalization theory and backdoor poisoning, offering a trigger design strategy with tangible attack efficacy and insights for defense research.

Abstract

The generalization bound is a crucial theoretical tool for assessing the generalizability of learning methods and there exist vast literatures on generalizability of normal learning, adversarial learning, and data poisoning. Unlike other data poison attacks, the backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set and the purpose of the attack is two-fold. To our knowledge, the generalization bound for the backdoor attack has not been established. In this paper, we fill this gap by deriving algorithm-independent generalization bounds in the clean-label backdoor attack scenario. Precisely, based on the goals of backdoor attack, we give upper bounds for the clean sample population errors and the poison population errors in terms of the empirical error on the poisoned training dataset. Furthermore, based on the theoretical result, a new clean-label backdoor attack is proposed that computes the poisoning trigger by combining adversarial noise and indiscriminate poison. We show its effectiveness in a variety of settings.
Paper Structure (40 sections, 34 theorems, 73 equations, 4 figures, 14 tables, 1 algorithm)

This paper contains 40 sections, 34 theorems, 73 equations, 4 figures, 14 tables, 1 algorithm.

Key Result

Theorem 1.1

Let ${\mathcal{F}}$ be any neural network with fixed depth and width, $N=|{\mathcal{D}}_{{{\rm{tr}}}}|$, and no more than $\alpha$ percent of the samples labeled $l_p$ in ${\mathcal{D}}_{{{\rm{tr}}}}$ are poisoned. Then with high probability, we have

Figures (4)

  • Figure 1: From top row to bottom row are respectively the clean images, normalized triggers (original trigger has $L_\infty$ norm bound $16/255$), poison images. Due to the selection of $U$, the upper left corners of the poison images are similar, while the other parts are used to generate adversaries.
  • Figure 2: Attack performance during the training process on CIFAR10 with ResNet18 and VGG16. This figure shows the trend of the poison model accuracy ($A$), attack success rate (ASR) and clean model accuracy ($A_c$).
  • Figure 3: Performance of different target label $l_p$. We show the poison model accuracy ($A$), accuracy of target label ($A_t$), attack success rate ($A_p$) on CIFAR-10, using VGG16 and ResNet18.
  • Figure 4: When trigger is a patch without norm limitation, it is not invisible. This figure is from souri2022sleeper.

Theorems & Definitions (69)

  • Theorem 1.1: Informal
  • Theorem 1.2: Informal
  • Theorem 4.1
  • Remark 4.2
  • Remark 4.3
  • Remark 4.4
  • Theorem 4.5
  • Remark 4.6
  • Remark 4.7
  • Remark 4.8
  • ...and 59 more