Table of Contents
Fetching ...

Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

Kallol Mondal, Ankush Kumar

TL;DR

The paper tackles the high energy cost of Transformer attention by presenting a neuromorphic Spiking STDP Transformer (S²TDPT) in which attention emerges from spike-timing-dependent plasticity rather than dot-product computations. It introduces four components—Spiking Patch Splitting, STDP-based Self-Attention, an MLP, and temporal aggregation—processing inputs as spike trains with addition-only operations and in-memory synaptic updates. The method achieves CIFAR-10/100 top-1 accuracies of 94.35% and 78.08% at four timesteps, with a remarkably low energy footprint of 0.49 mJ, corresponding to substantial energy reductions over both spiking and conventional Transformers. The work demonstrates object-centric interpretability via Grad-CAM and spike-rate maps, and argues for the practicality of neuromorphic deployment, suggesting future extensions to larger datasets and hardware implementations that harness STDP-capable devices. Overall, S²TDPT provides a biologically grounded, hardware-friendly pathway toward energy-efficient, explainable neuromorphic Transformers with strong empirical performance on image classification tasks.

Abstract

Attention is the brain's ability to selectively focus on a few specific aspects while ignoring irrelevant ones. This biological principle inspired the attention mechanism in modern Transformers. Transformers now underpin large language models (LLMs) such as GPT, but at the cost of massive training and inference energy, leading to a large carbon footprint. While brain attention emerges from neural circuits, Transformer attention relies on dot-product similarity to weight elements in the input sequence. Neuromorphic computing, especially spiking neural networks (SNNs), offers a brain-inspired path to energy-efficient intelligence. Despite recent work on attention-based spiking Transformers, the core attention layer remains non-neuromorphic. Current spiking attention (i) relies on dot-product or element-wise similarity suited to floating-point operations, not event-driven spikes; (ii) keeps attention matrices that suffer from the von Neumann bottleneck, limiting in-memory computing; and (iii) still diverges from brain-like computation. To address these issues, we propose the Spiking STDP Transformer (S$^{2}$TDPT), a neuromorphic Transformer that implements self-attention through spike-timing-dependent plasticity (STDP), embedding query--key correlations in synaptic weights. STDP, a core mechanism of memory and learning in the brain and widely studied in neuromorphic devices, naturally enables in-memory computing and supports non-von Neumann hardware. On CIFAR-10 and CIFAR-100, our model achieves 94.35\% and 78.08\% accuracy with only four timesteps and 0.49 mJ on CIFAR-100, an 88.47\% energy reduction compared to a standard ANN Transformer. Grad-CAM shows that the model attends to semantically relevant regions, enhancing interpretability. Overall, S$^{2}$TDPT illustrates how biologically inspired attention can yield energy-efficient, hardware-friendly, and explainable neuromorphic models.

Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

TL;DR

The paper tackles the high energy cost of Transformer attention by presenting a neuromorphic Spiking STDP Transformer (S²TDPT) in which attention emerges from spike-timing-dependent plasticity rather than dot-product computations. It introduces four components—Spiking Patch Splitting, STDP-based Self-Attention, an MLP, and temporal aggregation—processing inputs as spike trains with addition-only operations and in-memory synaptic updates. The method achieves CIFAR-10/100 top-1 accuracies of 94.35% and 78.08% at four timesteps, with a remarkably low energy footprint of 0.49 mJ, corresponding to substantial energy reductions over both spiking and conventional Transformers. The work demonstrates object-centric interpretability via Grad-CAM and spike-rate maps, and argues for the practicality of neuromorphic deployment, suggesting future extensions to larger datasets and hardware implementations that harness STDP-capable devices. Overall, S²TDPT provides a biologically grounded, hardware-friendly pathway toward energy-efficient, explainable neuromorphic Transformers with strong empirical performance on image classification tasks.

Abstract

Attention is the brain's ability to selectively focus on a few specific aspects while ignoring irrelevant ones. This biological principle inspired the attention mechanism in modern Transformers. Transformers now underpin large language models (LLMs) such as GPT, but at the cost of massive training and inference energy, leading to a large carbon footprint. While brain attention emerges from neural circuits, Transformer attention relies on dot-product similarity to weight elements in the input sequence. Neuromorphic computing, especially spiking neural networks (SNNs), offers a brain-inspired path to energy-efficient intelligence. Despite recent work on attention-based spiking Transformers, the core attention layer remains non-neuromorphic. Current spiking attention (i) relies on dot-product or element-wise similarity suited to floating-point operations, not event-driven spikes; (ii) keeps attention matrices that suffer from the von Neumann bottleneck, limiting in-memory computing; and (iii) still diverges from brain-like computation. To address these issues, we propose the Spiking STDP Transformer (STDPT), a neuromorphic Transformer that implements self-attention through spike-timing-dependent plasticity (STDP), embedding query--key correlations in synaptic weights. STDP, a core mechanism of memory and learning in the brain and widely studied in neuromorphic devices, naturally enables in-memory computing and supports non-von Neumann hardware. On CIFAR-10 and CIFAR-100, our model achieves 94.35\% and 78.08\% accuracy with only four timesteps and 0.49 mJ on CIFAR-100, an 88.47\% energy reduction compared to a standard ANN Transformer. Grad-CAM shows that the model attends to semantically relevant regions, enhancing interpretability. Overall, STDPT illustrates how biologically inspired attention can yield energy-efficient, hardware-friendly, and explainable neuromorphic models.

Paper Structure

This paper contains 14 sections, 30 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Synaptic similarity computation through spike-timing-dependent plasticity (STDP) and In-memory computing
  • Figure 2: Brain-inspired attention module proposed in this work. The neural network computes the attention weight matrix by capturing similarity between the Q and K matrices using STDP, which is then used to weight the V matrix according to importance.
  • Figure 3: Overview of the Spiking STDP Transformer, comprising Spiking Patch Splitting (SPS), an encoder layer with STDP-based self-attention and a multi-layer perceptron (MLP) module, followed by the classification head. The STDP Self Attention module is proposed in this work.
  • Figure 4: S2TDPT confusion matrix on CIFAR-10, showing per-class classification performance.
  • Figure 5: The figure displays the visualization results for five different classes, each shown with three components from left to right: (1) the original input image from CIFAR10, (2) the Spiking Grad-CAM ($S^2TDPT$) visualization, and (3) the Spike Firing Rate (SFR) Map. The Grad-CAM heatmaps consistently exhibit sparse, localized activations concentrated around the target objects (e.g., the car body, dog's head, or horse's torso). This confirms that the model’s classification decision, derived from the features of the final Transformer block, is based on discriminative and task-relevant object features.