PECAN: A Deterministic Certified Defense Against Backdoor Attacks
Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni
TL;DR
PECAN presents a deterministic, certified defense against backdoor attacks by partitioning training data into disjoint subsets, training an ensemble of models, and applying evasion certification to each model. Aggregation of certified predictions yields a final decision with a provable backdoor-robust radius, while abstaining when certification cannot be achieved. Empirical results on MNIST, CIFAR10, and EMBER show PECAN outperforming state-of-the-art probabilistic defenses in certified accuracy and drastically reducing backdoor attack success rates under BadNets and XBA, with substantially lower computation time. The approach offers a practical, scalable path to robust ML in security-sensitive settings, though it faces limitations in radius size and applicability to very large datasets, suggesting future work on efficiency, larger models, and collaborative certification strategies.
Abstract
Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.
