Do Spikes Protect Privacy? Investigating Black-Box Model Inversion Attacks in Spiking Neural Networks
Hamed Poursiami, Ayana Moshruba, Maryam Parsa
TL;DR
This work investigates privacy risks of Spiking Neural Networks (SNNs) under black-box Model Inversion (MI) attacks. It adapts the GAMIN framework to the spiking domain by incorporating rate encoding for inputs and decoding for outputs, and evaluates on MNIST and AT&T Face datasets. Results show that SNNs exhibit stronger resistance to MI attacks than ANNs, with degraded reconstructions, unstable attack convergence, and lower target-model accuracy on inverted samples. The findings suggest that the discrete, temporally distributed nature of spike-based computation, together with encoding/decoding schemes, creates irregular decision boundaries that hinder surrogate modeling, highlighting privacy advantages of neuromorphic architectures and guiding future exploration of alternative encodings and neuromorphic data.
Abstract
As machine learning models become integral to security-sensitive applications, concerns over data leakage from adversarial attacks continue to rise. Model Inversion (MI) attacks pose a significant privacy threat by enabling adversaries to reconstruct training data from model outputs. While MI attacks on Artificial Neural Networks (ANNs) have been widely studied, Spiking Neural Networks (SNNs) remain largely unexplored in this context. Due to their event-driven and discrete computations, SNNs introduce fundamental differences in information processing that may offer inherent resistance to such attacks. A critical yet underexplored aspect of this threat lies in black-box settings, where attackers operate through queries without direct access to model parameters or gradients-representing a more realistic adversarial scenario in deployed systems. This work presents the first study of black-box MI attacks on SNNs. We adapt a generative adversarial MI framework to the spiking domain by incorporating rate-based encoding for input transformation and decoding mechanisms for output interpretation. Our results show that SNNs exhibit significantly greater resistance to MI attacks than ANNs, as demonstrated by degraded reconstructions, increased instability in attack convergence, and overall reduced attack effectiveness across multiple evaluation metrics. Further analysis suggests that the discrete and temporally distributed nature of SNN decision boundaries disrupts surrogate modeling, limiting the attacker's ability to approximate the target model.
