Unsupervised Backdoor Detection and Mitigation for Spiking Neural Networks
Jiachen Li, Bang Wu, Xiaoyu Xia, Xiaoning Liu, Xun Yi, Xiuzhen Zhang
TL;DR
This work addresses backdoor threats in Spiking Neural Networks (SNNs) by proposing a full lifecycle defense tailored to their neuromorphic, event-driven nature. Temporal Membrane Potential Backdoor Detection (TMPBD) provides unsupervised, data-free detection by exploiting the maximum margin of temporal membrane potential, while Neural Dendrites Suppression Backdoor Mitigation (NDSBM) clamps early-layer inputs to suppress malicious neurons with minimal impact on clean accuracy. Together, TMPBD and NDSBM achieve 100% attack-label detection accuracy and dramatically reduce attack success rates (ASR) from 100% to as low as 2.81% when used end-to-end, across multiple neuromorphic benchmarks and backdoor variants. The approach advances practical security for SNNs, showing robustness to adaptive attacks and imbalanced data, and offering a scalable, data-efficient defense suitable for deployment in neuromorphic contexts.
Abstract
Spiking Neural Networks (SNNs) have gained increasing attention for their superior energy efficiency compared to Artificial Neural Networks (ANNs). However, their security aspects, particularly under backdoor attacks, have received limited attention. Existing defense methods developed for ANNs perform poorly or can be easily bypassed in SNNs due to their event-driven and temporal dependencies. This paper identifies the key blockers that hinder traditional backdoor defenses in SNNs and proposes an unsupervised post-training detection framework, Temporal Membrane Potential Backdoor Detection (TMPBD), to overcome these challenges. TMPBD leverages the maximum margin statistics of temporal membrane potential (TMP) in the final spiking layer to detect target labels without any attack knowledge or data access. We further introduce a robust mitigation mechanism, Neural Dendrites Suppression Backdoor Mitigation (NDSBM), which clamps dendritic connections between early convolutional layers to suppress malicious neurons while preserving benign behaviors, guided by TMP extracted from a small, clean, unlabeled dataset. Extensive experiments on multiple neuromorphic benchmarks and state-of-the-art input-aware dynamic trigger attacks demonstrate that TMPBD achieves 100% detection accuracy, while NDSBM reduces the attack success rate from 100% to 8.44%, and to 2.81% when combined with detection, without degrading clean accuracy.
