Table of Contents
Fetching ...

Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers

Zihang Song, Prabodh Katti, Osvaldo Simeone, Bipin Rajendran

TL;DR

This work addresses the energy and data-movement bottlenecks of transformer models by introducing Xpikeformer, a hybrid analog–digital accelerator tailored for spiking transformers. It combines an AIMC engine for static-weight layers with a stochastic spiking attention (SSA) engine to handle attention via low-energy logic gates, avoiding heavy digital arithmetic. Key innovations include row-block-wise weight mapping to minimize non-binary activations, PCM-based NVM crossbars for efficient MVMs, a Bernoulli-neuron–driven SSA, and a hardware-aware training framework with global drift compensation. Empirical results show up to $13\times$ energy reduction and significant latency advantages over state-of-the-art digital ANN accelerators and ideal SOTA SNN projections, while maintaining competitive accuracy on image classification and wireless symbol-detection tasks. These findings suggest a practical pathway for energy-efficient edge deployment of spiking transformers and highlight areas for future crossbar improvements and hardware refinements.

Abstract

The integration of neuromorphic computing and transformers through spiking neural networks (SNNs) offers a promising path to energy-efficient sequence modeling, with the potential to overcome the energy-intensive nature of the artificial neural network (ANN)-based transformers. However, the algorithmic efficiency of SNN-based transformers cannot be fully exploited on GPUs due to architectural incompatibility. This paper introduces Xpikeformer, a hybrid analog-digital hardware architecture designed to accelerate SNN-based transformer models. The architecture integrates analog in-memory computing (AIMC) for feedforward and fully connected layers, and a stochastic spiking attention (SSA) engine for efficient attention mechanisms. We detail the design, implementation, and evaluation of Xpikeformer, demonstrating significant improvements in energy consumption and computational efficiency. Through image classification tasks and wireless communication symbol detection tasks, we show that Xpikeformer can achieve inference accuracy comparable to the GPU implementation of ANN-based transformers. Evaluations reveal that Xpikeformer achieves $13\times$ reduction in energy consumption at approximately the same throughput as the state-of-the-art (SOTA) digital accelerator for ANN-based transformers. Additionally, Xpikeformer achieves up to $1.9\times$ energy reduction compared to the optimal digital ASIC projection of SOTA SNN-based transformers.

Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers

TL;DR

This work addresses the energy and data-movement bottlenecks of transformer models by introducing Xpikeformer, a hybrid analog–digital accelerator tailored for spiking transformers. It combines an AIMC engine for static-weight layers with a stochastic spiking attention (SSA) engine to handle attention via low-energy logic gates, avoiding heavy digital arithmetic. Key innovations include row-block-wise weight mapping to minimize non-binary activations, PCM-based NVM crossbars for efficient MVMs, a Bernoulli-neuron–driven SSA, and a hardware-aware training framework with global drift compensation. Empirical results show up to energy reduction and significant latency advantages over state-of-the-art digital ANN accelerators and ideal SOTA SNN projections, while maintaining competitive accuracy on image classification and wireless symbol-detection tasks. These findings suggest a practical pathway for energy-efficient edge deployment of spiking transformers and highlight areas for future crossbar improvements and hardware refinements.

Abstract

The integration of neuromorphic computing and transformers through spiking neural networks (SNNs) offers a promising path to energy-efficient sequence modeling, with the potential to overcome the energy-intensive nature of the artificial neural network (ANN)-based transformers. However, the algorithmic efficiency of SNN-based transformers cannot be fully exploited on GPUs due to architectural incompatibility. This paper introduces Xpikeformer, a hybrid analog-digital hardware architecture designed to accelerate SNN-based transformer models. The architecture integrates analog in-memory computing (AIMC) for feedforward and fully connected layers, and a stochastic spiking attention (SSA) engine for efficient attention mechanisms. We detail the design, implementation, and evaluation of Xpikeformer, demonstrating significant improvements in energy consumption and computational efficiency. Through image classification tasks and wireless communication symbol detection tasks, we show that Xpikeformer can achieve inference accuracy comparable to the GPU implementation of ANN-based transformers. Evaluations reveal that Xpikeformer achieves reduction in energy consumption at approximately the same throughput as the state-of-the-art (SOTA) digital accelerator for ANN-based transformers. Additionally, Xpikeformer achieves up to energy reduction compared to the optimal digital ASIC projection of SOTA SNN-based transformers.
Paper Structure (38 sections, 7 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 38 sections, 7 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) Comparison of an artificial neuron (left) and a spiking neuron (right). (b) A simplified block diagram of a spiking transformer.
  • Figure 2: Illustration of a typical implementation of synaptic array peng2020dnn+ with spike-encoded signals as input.
  • Figure 3: The overall system architecture of Xpikefomer.
  • Figure 4: A illustration of the row-block-wise mapping strategy.
  • Figure 5: Block diagram of an $N\times N$ SSA tile.
  • ...and 5 more figures