Table of Contents
Fetching ...

Stealthy Backdoor Attack via Confidence-driven Sampling

Pengfei He, Yue Xing, Han Xu, Jie Ren, Yingqian Cui, Shenglai Zeng, Jiliang Tang, Makoto Yamada, Mohammad Sabokrou

TL;DR

This work addresses stealth in backdoor attacks by demonstrating that random poisoning sampling is detectable and proposing Confidence-driven Boundary Sampling (CBS) to select boundary-adjacent, low-confidence samples using a surrogate model. CBS is designed to be compatible with a wide range of trigger designs and backdoor techniques, and it is supported by theoretical (SVM-based) and empirical analyses showing reduced detectability and maintained attack efficacy under defenses. Across CIFAR-10/100, Tiny ImageNet, and ImageNet-1k, CBS consistently improves resilience against both detection-based and non-detection-based defenses, with clear tradeoffs controlled by the boundary-uncertainty parameter $oldsymbol{ extepsilon}$ and poisoning rate. The results imply that sampling strategy is a crucial lever in backdoor stealth and that CBS can adapt to real-world constraints, including partial data access, while highlighting opportunities to balance effectiveness and stealth in future defenses. Overall, the paper delivers a novel, versatile sampling mechanism that enhances backdoor stealthiness and motivates more robust defense strategies against boundary-adjacent poisoning.

Abstract

Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectable and defensible. The core idea of this paper is to strategically poison samples near the model's decision boundary and increase defense difficulty. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks. Importantly, our method operates independently of existing trigger designs, providing versatility and compatibility with various backdoor attack techniques. We substantiate the effectiveness of our approach through a comprehensive set of empirical experiments, demonstrating its potential to significantly enhance resilience against backdoor attacks in DNNs.

Stealthy Backdoor Attack via Confidence-driven Sampling

TL;DR

This work addresses stealth in backdoor attacks by demonstrating that random poisoning sampling is detectable and proposing Confidence-driven Boundary Sampling (CBS) to select boundary-adjacent, low-confidence samples using a surrogate model. CBS is designed to be compatible with a wide range of trigger designs and backdoor techniques, and it is supported by theoretical (SVM-based) and empirical analyses showing reduced detectability and maintained attack efficacy under defenses. Across CIFAR-10/100, Tiny ImageNet, and ImageNet-1k, CBS consistently improves resilience against both detection-based and non-detection-based defenses, with clear tradeoffs controlled by the boundary-uncertainty parameter and poisoning rate. The results imply that sampling strategy is a crucial lever in backdoor stealth and that CBS can adapt to real-world constraints, including partial data access, while highlighting opportunities to balance effectiveness and stealth in future defenses. Overall, the paper delivers a novel, versatile sampling mechanism that enhances backdoor stealthiness and motivates more robust defense strategies against boundary-adjacent poisoning.

Abstract

Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectable and defensible. The core idea of this paper is to strategically poison samples near the model's decision boundary and increase defense difficulty. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks. Importantly, our method operates independently of existing trigger designs, providing versatility and compatibility with various backdoor attack techniques. We substantiate the effectiveness of our approach through a comprehensive set of empirical experiments, demonstrating its potential to significantly enhance resilience against backdoor attacks in DNNs.
Paper Structure (41 sections, 5 theorems, 29 equations, 11 figures, 14 tables, 2 algorithms)

This paper contains 41 sections, 5 theorems, 29 equations, 11 figures, 14 tables, 2 algorithms.

Key Result

Theorem 4.2

Assume $\tilde{x}=x+\epsilon t/\|t\|_2:=x+a$ for some arbitrary $x$ and some trigger $t$ and strength $\epsilon$. Also assume $\mu_2=0$, $(\mu_2-\mu_1)^T\tilde{x}\ge 0$. Denote $\tilde{\mu}_2$ and $\tilde{S}_2$ as the sample mean and covariance matrix of the poisoned data with label $C_2$. Then the There exists some large constant $n_0$ so that when $n\geq n_0$, $d_M^2(\tilde{x},\tilde{C}_2)$ sat

Figures (11)

  • Figure 1: Latent space visualization of BadNet and Blend via Random and Boundary sampling.
  • Figure 2: The left two figures depict the distribution of $d_o$ when samples are Randomly selected by BadNet and Blend. The right two figures shows the relationship between $d_o$ and $d_t$ for BadNet and Blend.
  • Figure 3: Backdoor on SVM
  • Figure 4: An illustrating figure for the existence of hard margin. The shaded area represents the region of $\tilde{x}$ where a hard margin exists.
  • Figure 5: An illustration on the influence of $\epsilon$ in CBS when applied to BadNet. The magenta bar represents ASR without defenses while the left bars present ASR under defenses.
  • ...and 6 more figures

Theorems & Definitions (12)

  • Definition 4.1: Confidence-based boundary samples
  • Theorem 4.2: Mahalanobis distance
  • Theorem 4.3: Attack success rate, population
  • Remark 4.4: Existence of hard margin
  • Theorem 4.5: Finite-sample scenario
  • Proposition 4.6: CBS in finite-sample scenario
  • Proposition 4.7: Random poisoning in finite-sample scenario
  • Remark 4.8: Effectiveness-stealthiness trade-off
  • Remark 4.9: Hard margin does not exist
  • proof : Proof of Theorem \ref{['theorem1']}
  • ...and 2 more