Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples
Nuo Xu, Kaleel Mahmood, Haowen Fang, Ethan Rathbun, Caiwen Ding, Wujie Wen
TL;DR
The paper tackles the robustness of Spiking Neural Networks (SNNs) to adversarial examples, a priority as SNNs gain deployment in energy-constrained settings. It first demonstrates that white-box attack effectiveness on SNNs hinges on the chosen surrogate gradient estimator, and then analyzes cross-model transferability to Vision Transformers and CNNs, finding generally low transfer between SNNs and ViTs. To address transferability gaps, it introduces Mixed Dynamic Spiking Estimation (MDSE), a multi-component attack that dynamically selects surrogate gradients and blends gradients from multiple models, achieving up to 91.4% improved effectiveness on SNN/ViT ensembles and a 3x boost on adversarially trained SNN ensembles compared with Auto-PGD. Across CIFAR-10, CIFAR-100, and ImageNet with 19 classifiers, MDSE consistently outperforms existing attacks, underscoring the need for adaptive, multi-model adversarial evaluation and informing defense design for SNN security.
Abstract
Spiking neural networks (SNNs) have drawn much attention for their high energy efficiency and recent advances in classification performance. However, unlike traditional deep learning, the robustness of SNNs to adversarial examples remains underexplored. This work advances the adversarial attack side of SNNs and makes three major contributions. First, we show that successful white-box attacks on SNNs strongly depend on the surrogate gradient estimation technique, even for adversarially trained models. Second, using the best single surrogate gradient estimator, we study the transferability of adversarial examples between SNNs and state-of-the-art architectures such as Vision Transformers (ViTs) and CNNs. Our analysis reveals two major gaps: no existing white-box attack leverages multiple surrogate estimators, and no single attack effectively fools both SNNs and non-SNN models simultaneously. Third, we propose the Mixed Dynamic Spiking Estimation (MDSE) attack, which dynamically combines multiple surrogate gradients to overcome these gaps. MDSE produces adversarial examples that fool both SNN and non-SNN models, achieving up to 91.4% higher effectiveness on SNN/ViT ensembles and a 3x boost on adversarially trained SNN ensembles over Auto-PGD. Experiments span three datasets (CIFAR-10, CIFAR-100, ImageNet) and nineteen classifiers, and we will release code and models upon publication.
