Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

Kallol Mondal; Ankush Kumar

Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

Kallol Mondal, Ankush Kumar

TL;DR

The paper tackles the high energy cost of Transformer attention by presenting a neuromorphic Spiking STDP Transformer (S²TDPT) in which attention emerges from spike-timing-dependent plasticity rather than dot-product computations. It introduces four components—Spiking Patch Splitting, STDP-based Self-Attention, an MLP, and temporal aggregation—processing inputs as spike trains with addition-only operations and in-memory synaptic updates. The method achieves CIFAR-10/100 top-1 accuracies of 94.35% and 78.08% at four timesteps, with a remarkably low energy footprint of 0.49 mJ, corresponding to substantial energy reductions over both spiking and conventional Transformers. The work demonstrates object-centric interpretability via Grad-CAM and spike-rate maps, and argues for the practicality of neuromorphic deployment, suggesting future extensions to larger datasets and hardware implementations that harness STDP-capable devices. Overall, S²TDPT provides a biologically grounded, hardware-friendly pathway toward energy-efficient, explainable neuromorphic Transformers with strong empirical performance on image classification tasks.

Abstract

Attention is the brain's ability to selectively focus on a few specific aspects while ignoring irrelevant ones. This biological principle inspired the attention mechanism in modern Transformers. Transformers now underpin large language models (LLMs) such as GPT, but at the cost of massive training and inference energy, leading to a large carbon footprint. While brain attention emerges from neural circuits, Transformer attention relies on dot-product similarity to weight elements in the input sequence. Neuromorphic computing, especially spiking neural networks (SNNs), offers a brain-inspired path to energy-efficient intelligence. Despite recent work on attention-based spiking Transformers, the core attention layer remains non-neuromorphic. Current spiking attention (i) relies on dot-product or element-wise similarity suited to floating-point operations, not event-driven spikes; (ii) keeps attention matrices that suffer from the von Neumann bottleneck, limiting in-memory computing; and (iii) still diverges from brain-like computation. To address these issues, we propose the Spiking STDP Transformer (S$^{2}$TDPT), a neuromorphic Transformer that implements self-attention through spike-timing-dependent plasticity (STDP), embedding query--key correlations in synaptic weights. STDP, a core mechanism of memory and learning in the brain and widely studied in neuromorphic devices, naturally enables in-memory computing and supports non-von Neumann hardware. On CIFAR-10 and CIFAR-100, our model achieves 94.35\% and 78.08\% accuracy with only four timesteps and 0.49 mJ on CIFAR-100, an 88.47\% energy reduction compared to a standard ANN Transformer. Grad-CAM shows that the model attends to semantically relevant regions, enhancing interpretability. Overall, S$^{2}$TDPT illustrates how biologically inspired attention can yield energy-efficient, hardware-friendly, and explainable neuromorphic models.

Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

TL;DR

Abstract

Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)