VESTA: A Versatile SNN-Based Transformer Accelerator with Unified PEs for Multiple Computational Layers
Ching-Yao Chen, Meng-Chieh Chen, Tian-Sheuan Chang
TL;DR
VESTA tackles the challenge of running transformer-style models on edge devices by unifying convolution, linear, and dot-product computations under a spike-based Processing Element design. It introduces a suite of specialized spike-based operations—Zig-Zag Spiking Convolution (ZSC), Shift-and-Sum Spiking Convolution (SSSC), Weight Stationary Spiking Linear Operation (WSSL), Spiking Tile-wise Dot Product Calculation (STDP)—and a Temporal Fused Leaky Integrate-and-Fire (TFLIF) module to enable multi-timestep processing with reduced memory and data traffic. Hardware results show a 0.844 mm^2 core in 28 nm CMOS at 500 MHz, capable of 30 fps on 224×224 RGB images, with a SRAM footprint of 107 KB and a 512-PE configuration; WSSL dominates compute time at ~81%. These contributions demonstrate a viable, energy-efficient path for edge inference of transformer-like models using spike-form data. The work advances practical deployment of SNN-augmented transformers by tightly integrating computation types within a unified hardware substrate, reducing memory bandwidth and improving throughput and power efficiency.
Abstract
Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively. However, transformer architectures typically involve multiple types of computational layers, including linear layers for MLP modules and classification heads, convolution layers for tokenizers, and dot product computations for self-attention mechanisms. These diverse operations pose significant challenges for hardware accelerator design, and to our knowledge, there is not yet a hardware solution that leverages spike-form data from SNNs for transformer architectures. In this paper, we introduce VESTA, a novel hardware design that synergizes these technologies, presenting unified Processing Elements (PEs) capable of efficiently performing all three types of computations crucial to transformer structures. VESTA uniquely benefits from the spike-form outputs of the Spike Neuron Layers \cite{zhou2024spikformer}, simplifying multiplication operations by reducing them from handling two 8-bit integers to handling one 8-bit integer and a binary spike. This reduction enables the use of multiplexers in the PE module, significantly enhancing computational efficiency while maintaining the low-power advantage of SNNs. Experimental results show that the core area of VESTA is \(0.844 mm^2\). It operates at 500MHz and is capable of real-time image classification at 30 fps.
