Table of Contents
Fetching ...

Spiking Transformer Hardware Accelerators in 3D Integration

Boxun Xu, Junyoung Hwang, Pruek Vanna-iampikul, Sung Kyu Lim, Peng Li

TL;DR

This paper presents the first work on 3D spiking transformer hardware architecture and design methodology, and demonstrates significant energy and delay improvements compared to conventional 2D CMOS integration.

Abstract

Spiking neural networks (SNNs) are powerful models of spatiotemporal computation and are well suited for deployment on resource-constrained edge devices and neuromorphic hardware due to their low power consumption. Leveraging attention mechanisms similar to those found in their artificial neural network counterparts, recently emerged spiking transformers have showcased promising performance and efficiency by capitalizing on the binary nature of spiking operations. Recognizing the current lack of dedicated hardware support for spiking transformers, this paper presents the first work on 3D spiking transformer hardware architecture and design methodology. We present an architecture and physical design co-optimization approach tailored specifically for spiking transformers. Through memory-on-logic and logic-on-logic stacking enabled by 3D integration, we demonstrate significant energy and delay improvements compared to conventional 2D CMOS integration.

Spiking Transformer Hardware Accelerators in 3D Integration

TL;DR

This paper presents the first work on 3D spiking transformer hardware architecture and design methodology, and demonstrates significant energy and delay improvements compared to conventional 2D CMOS integration.

Abstract

Spiking neural networks (SNNs) are powerful models of spatiotemporal computation and are well suited for deployment on resource-constrained edge devices and neuromorphic hardware due to their low power consumption. Leveraging attention mechanisms similar to those found in their artificial neural network counterparts, recently emerged spiking transformers have showcased promising performance and efficiency by capitalizing on the binary nature of spiking operations. Recognizing the current lack of dedicated hardware support for spiking transformers, this paper presents the first work on 3D spiking transformer hardware architecture and design methodology. We present an architecture and physical design co-optimization approach tailored specifically for spiking transformers. Through memory-on-logic and logic-on-logic stacking enabled by 3D integration, we demonstrate significant energy and delay improvements compared to conventional 2D CMOS integration.

Paper Structure

This paper contains 25 sections, 2 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) Model Architecture of spiking transformers. (b) Multi-head spiking self-attention block within each spiking encoder block of spiking transformers.
  • Figure 2: Proposed 3D Architecture for processing spiking MLP layers: (a) 3D partitioning and dataflow, (b) systolic PE array for synaptic integration on the bottom tier, and (c) PE design.
  • Figure 3: Proposed 3D Architecture for processing spiking attention layers: (a) 3D partitioning, (b) proposed reconfigurable dense systolic array on the bottom tier, and (c) reconfigurable PE design supporting two different computations.
  • Figure 4: Cross-section view comparison between 2D and F2F 3D IC metal stack.
  • Figure 5: The layout comparison between 2D and 3D spiking MLP accelerators.(a)2D design. (b)(c)(d) 3D design.
  • ...and 2 more figures