Table of Contents
Fetching ...

Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples

Nuo Xu, Kaleel Mahmood, Haowen Fang, Ethan Rathbun, Caiwen Ding, Wujie Wen

TL;DR

The paper tackles the robustness of Spiking Neural Networks (SNNs) to adversarial examples, a priority as SNNs gain deployment in energy-constrained settings. It first demonstrates that white-box attack effectiveness on SNNs hinges on the chosen surrogate gradient estimator, and then analyzes cross-model transferability to Vision Transformers and CNNs, finding generally low transfer between SNNs and ViTs. To address transferability gaps, it introduces Mixed Dynamic Spiking Estimation (MDSE), a multi-component attack that dynamically selects surrogate gradients and blends gradients from multiple models, achieving up to 91.4% improved effectiveness on SNN/ViT ensembles and a 3x boost on adversarially trained SNN ensembles compared with Auto-PGD. Across CIFAR-10, CIFAR-100, and ImageNet with 19 classifiers, MDSE consistently outperforms existing attacks, underscoring the need for adaptive, multi-model adversarial evaluation and informing defense design for SNN security.

Abstract

Spiking neural networks (SNNs) have drawn much attention for their high energy efficiency and recent advances in classification performance. However, unlike traditional deep learning, the robustness of SNNs to adversarial examples remains underexplored. This work advances the adversarial attack side of SNNs and makes three major contributions. First, we show that successful white-box attacks on SNNs strongly depend on the surrogate gradient estimation technique, even for adversarially trained models. Second, using the best single surrogate gradient estimator, we study the transferability of adversarial examples between SNNs and state-of-the-art architectures such as Vision Transformers (ViTs) and CNNs. Our analysis reveals two major gaps: no existing white-box attack leverages multiple surrogate estimators, and no single attack effectively fools both SNNs and non-SNN models simultaneously. Third, we propose the Mixed Dynamic Spiking Estimation (MDSE) attack, which dynamically combines multiple surrogate gradients to overcome these gaps. MDSE produces adversarial examples that fool both SNN and non-SNN models, achieving up to 91.4% higher effectiveness on SNN/ViT ensembles and a 3x boost on adversarially trained SNN ensembles over Auto-PGD. Experiments span three datasets (CIFAR-10, CIFAR-100, ImageNet) and nineteen classifiers, and we will release code and models upon publication.

Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples

TL;DR

The paper tackles the robustness of Spiking Neural Networks (SNNs) to adversarial examples, a priority as SNNs gain deployment in energy-constrained settings. It first demonstrates that white-box attack effectiveness on SNNs hinges on the chosen surrogate gradient estimator, and then analyzes cross-model transferability to Vision Transformers and CNNs, finding generally low transfer between SNNs and ViTs. To address transferability gaps, it introduces Mixed Dynamic Spiking Estimation (MDSE), a multi-component attack that dynamically selects surrogate gradients and blends gradients from multiple models, achieving up to 91.4% improved effectiveness on SNN/ViT ensembles and a 3x boost on adversarially trained SNN ensembles compared with Auto-PGD. Across CIFAR-10, CIFAR-100, and ImageNet with 19 classifiers, MDSE consistently outperforms existing attacks, underscoring the need for adaptive, multi-model adversarial evaluation and informing defense design for SNN security.

Abstract

Spiking neural networks (SNNs) have drawn much attention for their high energy efficiency and recent advances in classification performance. However, unlike traditional deep learning, the robustness of SNNs to adversarial examples remains underexplored. This work advances the adversarial attack side of SNNs and makes three major contributions. First, we show that successful white-box attacks on SNNs strongly depend on the surrogate gradient estimation technique, even for adversarially trained models. Second, using the best single surrogate gradient estimator, we study the transferability of adversarial examples between SNNs and state-of-the-art architectures such as Vision Transformers (ViTs) and CNNs. Our analysis reveals two major gaps: no existing white-box attack leverages multiple surrogate estimators, and no single attack effectively fools both SNNs and non-SNN models simultaneously. Third, we propose the Mixed Dynamic Spiking Estimation (MDSE) attack, which dynamically combines multiple surrogate gradients to overcome these gaps. MDSE produces adversarial examples that fool both SNN and non-SNN models, achieving up to 91.4% higher effectiveness on SNN/ViT ensembles and a 3x boost on adversarially trained SNN ensembles over Auto-PGD. Experiments span three datasets (CIFAR-10, CIFAR-100, ImageNet) and nineteen classifiers, and we will release code and models upon publication.
Paper Structure (29 sections, 31 equations, 6 figures, 19 tables, 1 algorithm)

This paper contains 29 sections, 31 equations, 6 figures, 19 tables, 1 algorithm.

Figures (6)

  • Figure 1: Different surrogate gradient functions.
  • Figure 2: White-box attack on SNN models using different surrogate gradients for CIFAR-10, CIFAR-100 and ImageNet. Every curve corresponds to the performance of an attack with a specific surrogate gradient. The y-axis is accuracy, the x-axis is epsilon. For CIFAR-10/100, arctan produces the highest attack success rate. On ImageNet models, PWE performs best. Numerical values of the results are given in Table \ref{['tab:whitebox_transfer']} (Transfer SNN), Table \ref{['tab:whitebox_bp']} (BP SNN), Table \ref{['tab:whitebox_sewresnet']} (SEW ResNet), Table \ref{['tab:whitebox_vanilla_resnet']} (Vanilla Spiking ResNet) respectively.
  • Figure 3: Attack success rate of Auto-PGD with $\epsilon=0.031$ on adversarially trained SNNs using the best and worst possible Surrogate Gradient Estimator (SG).
  • Figure 4: Visual representation of transferability results for CIFAR-10. Model abbreviations are used for succinctness, S=SNN, R=ResNet, V=VGG-16, C=CNN, BP=Backpropagation, T denotes the Transfer SNN model with corresponding timestep and V-L=ViT-L.
  • Figure 5: Attack success rates of Max MIM, PGD, Auto-PGD, SAGA and MDSE on CIFAR-10, CIFAR-100 and ImageNet for different pairs of SNN and non-SNN models. Sorted by MDSE results in decreasing order.
  • ...and 1 more figures