Table of Contents
Fetching ...

PECAN: A Deterministic Certified Defense Against Backdoor Attacks

Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni

TL;DR

PECAN presents a deterministic, certified defense against backdoor attacks by partitioning training data into disjoint subsets, training an ensemble of models, and applying evasion certification to each model. Aggregation of certified predictions yields a final decision with a provable backdoor-robust radius, while abstaining when certification cannot be achieved. Empirical results on MNIST, CIFAR10, and EMBER show PECAN outperforming state-of-the-art probabilistic defenses in certified accuracy and drastically reducing backdoor attack success rates under BadNets and XBA, with substantially lower computation time. The approach offers a practical, scalable path to robust ML in security-sensitive settings, though it faces limitations in radius size and applicability to very large datasets, suggesting future work on efficiency, larger models, and collaborative certification strategies.

Abstract

Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.

PECAN: A Deterministic Certified Defense Against Backdoor Attacks

TL;DR

PECAN presents a deterministic, certified defense against backdoor attacks by partitioning training data into disjoint subsets, training an ensemble of models, and applying evasion certification to each model. Aggregation of certified predictions yields a final decision with a provable backdoor-robust radius, while abstaining when certification cannot be achieved. Empirical results on MNIST, CIFAR10, and EMBER show PECAN outperforming state-of-the-art probabilistic defenses in certified accuracy and drastically reducing backdoor attack success rates under BadNets and XBA, with substantially lower computation time. The approach offers a practical, scalable path to robust ML in security-sensitive settings, though it faces limitations in radius size and applicability to very large datasets, suggesting future work on efficiency, larger models, and collaborative certification strategies.

Abstract

Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.
Paper Structure (60 sections, 2 theorems, 15 equations, 6 figures, 6 tables)

This paper contains 60 sections, 2 theorems, 15 equations, 6 figures, 6 tables.

Key Result

Theorem 4.1

Given a dataset $D$ and a test input $\mathbf{x}$, PECAN computes the prediction $\bar{A}_{D}(\mathbf{x})$ and the certified radius as $r$. Then, either $r=\diamond$ or

Figures (6)

  • Figure 1: An overview of our approach PECAN.
  • Figure 2: Comparison to BagFlip on CIFAR10, EMBER and MNIST, showing the normal accuracy (dotted lines) and the certified accuracy (solid lines) at different modification amounts $R$.
  • Figure 3: An illustration of the proof of Theorem \ref{['theorem: main']}. It shows the worst case for PECAN, where the attacker can change all predictions in $D_\mathrm{abs}$ and $D_\mathrm{bd}$ to the runner-up label $y'$. Note that we group $D_\mathrm{abs}$, $D_\mathrm{bd}$, and $D_\mathrm{safe}$ together to ease illustration.
  • Figure 4: Comparison to BagFlip on CIFAR10, EMBER, and MNIST, showing the normal accuracy (dotted lines) and the certified accuracy (solid lines) at different modification amounts $R$. For CIFAR10: $a=50$ and $b=100$. For EMBER: $a=200$ and $b=400$.
  • Figure 5: Results of PECAN on CIFAR10 and EMBER, showing the normal accuracy (dotted lines) and the certified accuracy (solid lines) at different modification amounts $R$. For MNIST: $a=600$ and $b=1200$. For CIFAR10: $a=50$ and $b=100$.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Example 3.1
  • Example 3.2
  • Remark 3.1
  • Theorem 4.1: Soundness of PECAN
  • Theorem 4.2: Soundness of PECAN under backdoored data
  • proof
  • proof