Stealthy Backdoor Attack via Confidence-driven Sampling

Pengfei He; Yue Xing; Han Xu; Jie Ren; Yingqian Cui; Shenglai Zeng; Jiliang Tang; Makoto Yamada; Mohammad Sabokrou

Stealthy Backdoor Attack via Confidence-driven Sampling

Pengfei He, Yue Xing, Han Xu, Jie Ren, Yingqian Cui, Shenglai Zeng, Jiliang Tang, Makoto Yamada, Mohammad Sabokrou

TL;DR

This work addresses stealth in backdoor attacks by demonstrating that random poisoning sampling is detectable and proposing Confidence-driven Boundary Sampling (CBS) to select boundary-adjacent, low-confidence samples using a surrogate model. CBS is designed to be compatible with a wide range of trigger designs and backdoor techniques, and it is supported by theoretical (SVM-based) and empirical analyses showing reduced detectability and maintained attack efficacy under defenses. Across CIFAR-10/100, Tiny ImageNet, and ImageNet-1k, CBS consistently improves resilience against both detection-based and non-detection-based defenses, with clear tradeoffs controlled by the boundary-uncertainty parameter $oldsymbol{ extepsilon}$ and poisoning rate. The results imply that sampling strategy is a crucial lever in backdoor stealth and that CBS can adapt to real-world constraints, including partial data access, while highlighting opportunities to balance effectiveness and stealth in future defenses. Overall, the paper delivers a novel, versatile sampling mechanism that enhances backdoor stealthiness and motivates more robust defense strategies against boundary-adjacent poisoning.

Abstract

Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectable and defensible. The core idea of this paper is to strategically poison samples near the model's decision boundary and increase defense difficulty. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks. Importantly, our method operates independently of existing trigger designs, providing versatility and compatibility with various backdoor attack techniques. We substantiate the effectiveness of our approach through a comprehensive set of empirical experiments, demonstrating its potential to significantly enhance resilience against backdoor attacks in DNNs.

Stealthy Backdoor Attack via Confidence-driven Sampling

TL;DR

and poisoning rate. The results imply that sampling strategy is a crucial lever in backdoor stealth and that CBS can adapt to real-world constraints, including partial data access, while highlighting opportunities to balance effectiveness and stealth in future defenses. Overall, the paper delivers a novel, versatile sampling mechanism that enhances backdoor stealthiness and motivates more robust defense strategies against boundary-adjacent poisoning.

Abstract

Paper Structure (41 sections, 5 theorems, 29 equations, 11 figures, 14 tables, 2 algorithms)

This paper contains 41 sections, 5 theorems, 29 equations, 11 figures, 14 tables, 2 algorithms.

Introduction
Related works
Backdoor attacks and defenses
Samplings in backdoor attacks
Definition and Notation
Threat model
A general pipeline for backdoor attacks
Method
Revisit random sampling
Confidence-driven boundary sampling (CBS)
Theoretical understandings
Experiment
Experimental settings
Performance of CBS in Type I backdoor attacks
Performance of CBS in Type II backdoor attacks
...and 26 more sections

Key Result

Theorem 4.2

Assume $\tilde{x}=x+\epsilon t/\|t\|_2:=x+a$ for some arbitrary $x$ and some trigger $t$ and strength $\epsilon$. Also assume $\mu_2=0$, $(\mu_2-\mu_1)^T\tilde{x}\ge 0$. Denote $\tilde{\mu}_2$ and $\tilde{S}_2$ as the sample mean and covariance matrix of the poisoned data with label $C_2$. Then the There exists some large constant $n_0$ so that when $n\geq n_0$, $d_M^2(\tilde{x},\tilde{C}_2)$ sat

Figures (11)

Figure 1: Latent space visualization of BadNet and Blend via Random and Boundary sampling.
Figure 2: The left two figures depict the distribution of $d_o$ when samples are Randomly selected by BadNet and Blend. The right two figures shows the relationship between $d_o$ and $d_t$ for BadNet and Blend.
Figure 3: Backdoor on SVM
Figure 4: An illustrating figure for the existence of hard margin. The shaded area represents the region of $\tilde{x}$ where a hard margin exists.
Figure 5: An illustration on the influence of $\epsilon$ in CBS when applied to BadNet. The magenta bar represents ASR without defenses while the left bars present ASR under defenses.
...and 6 more figures

Theorems & Definitions (12)

Definition 4.1: Confidence-based boundary samples
Theorem 4.2: Mahalanobis distance
Theorem 4.3: Attack success rate, population
Remark 4.4: Existence of hard margin
Theorem 4.5: Finite-sample scenario
Proposition 4.6: CBS in finite-sample scenario
Proposition 4.7: Random poisoning in finite-sample scenario
Remark 4.8: Effectiveness-stealthiness trade-off
Remark 4.9: Hard margin does not exist
proof : Proof of Theorem \ref{['theorem1']}
...and 2 more

Stealthy Backdoor Attack via Confidence-driven Sampling

TL;DR

Abstract

Stealthy Backdoor Attack via Confidence-driven Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (12)