Table of Contents
Fetching ...

ASTER: Attention-based Spiking Transformer Engine for Event-driven Reasoning

Tamoghno Das, Khanh Phan Vu, Hanning Chen, Hyunwoo Oh, Mohsen Imani

TL;DR

ASTER introduces a memory-centric Processing-in-Memory (PIM) accelerator for spike-driven transformers, addressing the energy and bandwidth challenges of event-driven vision. The work combines a hybrid analog–digital CIM backend with a sparsity-aware software stack that performs layer skipping and timestep reduction, guided by Bayesian optimization over TAFT and CBET. Key contributions include a spike-driven self-attention data path, programmable selective wordline activation, and a dataflow that reuses weights and membrane states across timesteps. Empirical results show dramatic energy reductions over edge GPUs and prior PIM baselines on both ImageNet-derived and DVS-based datasets, enabling real-time edge inference and paving the way for neurosymbolic integration at the extreme edge.

Abstract

The integration of spiking neural networks (SNNs) with transformer-based architectures has opened new opportunities for bio-inspired low-power, event-driven visual reasoning on edge devices. However, the high temporal resolution and binary nature of spike-driven computation introduce architectural mismatches with conventional digital hardware (CPU/GPU). Prior neuromorphic and Processing-in-Memory (PIM) accelerators struggle with high sparsity and complex operations prevalent in such models. To address these challenges, we propose a memory-centric hardware accelerator tailored for spiking transformers, optimized for deployment in real-time event-driven frameworks such as classification with both static and event-based input frames. Our design leverages a hybrid analog-digital PIM architecture with input sparsity optimizations, and a custom-designed dataflow to minimize memory access overhead and maximize data reuse under spatiotemporal sparsity, for compute and memory-efficient end-to-end execution of spiking transformers. We subsequently propose inference-time software optimizations for layer skipping, and timestep reduction, leveraging Bayesian Optimization with surrogate modeling to perform robust, efficient co-exploration of the joint algorithmic-microarchitectural design spaces under tight computational budgets. Evaluated on both image(ImageNet) and event-based (CIFAR-10 DVS, DVSGesture) classification, the accelerator achieves up to ~467x and ~1.86x energy reduction compared to edge GPU (Jetson Orin Nano) and previous PIM accelerators for spiking transformers, while maintaining competitive task accuracy on ImageNet dataset. This work enables a new class of intelligent ubiquitous edge AI, built using spiking transformer acceleration for low-power, real-time visual processing at the extreme edge.

ASTER: Attention-based Spiking Transformer Engine for Event-driven Reasoning

TL;DR

ASTER introduces a memory-centric Processing-in-Memory (PIM) accelerator for spike-driven transformers, addressing the energy and bandwidth challenges of event-driven vision. The work combines a hybrid analog–digital CIM backend with a sparsity-aware software stack that performs layer skipping and timestep reduction, guided by Bayesian optimization over TAFT and CBET. Key contributions include a spike-driven self-attention data path, programmable selective wordline activation, and a dataflow that reuses weights and membrane states across timesteps. Empirical results show dramatic energy reductions over edge GPUs and prior PIM baselines on both ImageNet-derived and DVS-based datasets, enabling real-time edge inference and paving the way for neurosymbolic integration at the extreme edge.

Abstract

The integration of spiking neural networks (SNNs) with transformer-based architectures has opened new opportunities for bio-inspired low-power, event-driven visual reasoning on edge devices. However, the high temporal resolution and binary nature of spike-driven computation introduce architectural mismatches with conventional digital hardware (CPU/GPU). Prior neuromorphic and Processing-in-Memory (PIM) accelerators struggle with high sparsity and complex operations prevalent in such models. To address these challenges, we propose a memory-centric hardware accelerator tailored for spiking transformers, optimized for deployment in real-time event-driven frameworks such as classification with both static and event-based input frames. Our design leverages a hybrid analog-digital PIM architecture with input sparsity optimizations, and a custom-designed dataflow to minimize memory access overhead and maximize data reuse under spatiotemporal sparsity, for compute and memory-efficient end-to-end execution of spiking transformers. We subsequently propose inference-time software optimizations for layer skipping, and timestep reduction, leveraging Bayesian Optimization with surrogate modeling to perform robust, efficient co-exploration of the joint algorithmic-microarchitectural design spaces under tight computational budgets. Evaluated on both image(ImageNet) and event-based (CIFAR-10 DVS, DVSGesture) classification, the accelerator achieves up to ~467x and ~1.86x energy reduction compared to edge GPU (Jetson Orin Nano) and previous PIM accelerators for spiking transformers, while maintaining competitive task accuracy on ImageNet dataset. This work enables a new class of intelligent ubiquitous edge AI, built using spiking transformer acceleration for low-power, real-time visual processing at the extreme edge.

Paper Structure

This paper contains 18 sections, 3 equations, 12 figures, 2 tables, 1 algorithm.

Figures (12)

  • Figure 1: Memory Access Distribution for Patch Embedding in Spiking Transformer on ImageNet
  • Figure 2: Layerwise Spike Firing Rates for SDT-8-512, illustrating extremely high sparsity for outputs of spiking self-attention
  • Figure 3: Software-side framework with layer skipping by attention activity and timestep reduction by early exit.
  • Figure 4: Per-class Average Timestep vs Accuracy for 2-256-t16 SDT on CIFAR10-DVS with CBET=0.99.
  • Figure 5: Overall hardware architecture of ASTER, showing hierarchy at chip, tile, and subarray levels.
  • ...and 7 more figures