Table of Contents
Fetching ...

ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer

Tong Xie, Yixuan Hu, Renjie Wei, Meng Li, Yuan Wang, Runsheng Wang, Ru Huang

TL;DR

ASCEND demonstrates the first end-to-end stochastic-computing accelerator for Vision Transformers by co-designing SC circuits and ViT networks to address GELU and softmax. It introduces a deterministic SC GELU block via gate-assisted selective interconnect and an iterative approximate softmax circuit, paired with a two-stage SC-friendly training pipeline (progressive quantization and approximate-softmax aware fine-tuning) to produce low-precision ViTs. The GELU and softmax blocks yield 56.3% and 22.6% reductions in mean absolute error and achieve 5.29× and 12.6× reductions in area-delay product, respectively, with substantial accuracy gains over baseline low-precision ViTs on CIFAR10/100. This work offers a practical path to accurate, efficient ViT acceleration using end-to-end SC and a flexible design space for hardware accuracy-efficiency trade-offs.

Abstract

Stochastic computing (SC) has emerged as a promising computing paradigm for neural acceleration. However, how to accelerate the state-of-the-art Vision Transformer (ViT) with SC remains unclear. Unlike convolutional neural networks, ViTs introduce notable compatibility and efficiency challenges because of their nonlinear functions, e.g., softmax and Gaussian Error Linear Units (GELU). In this paper, for the first time, a ViT accelerator based on end-to-end SC, dubbed ASCEND, is proposed. ASCEND co-designs the SC circuits and ViT networks to enable accurate yet efficient acceleration. To overcome the compatibility challenges, ASCEND proposes a novel deterministic SC block for GELU and leverages an SC-friendly iterative approximate algorithm to design an accurate and efficient softmax circuit. To improve inference efficiency, ASCEND develops a two-stage training pipeline to produce accurate low-precision ViTs. With extensive experiments, we show the proposed GELU and softmax blocks achieve 56.3% and 22.6% error reduction compared to existing SC designs, respectively and reduce the area-delay product (ADP) by 5.29x and 12.6x, respectively. Moreover, compared to the baseline low-precision ViTs, ASCEND also achieves significant accuracy improvements on CIFAR10 and CIFAR100.

ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer

TL;DR

ASCEND demonstrates the first end-to-end stochastic-computing accelerator for Vision Transformers by co-designing SC circuits and ViT networks to address GELU and softmax. It introduces a deterministic SC GELU block via gate-assisted selective interconnect and an iterative approximate softmax circuit, paired with a two-stage SC-friendly training pipeline (progressive quantization and approximate-softmax aware fine-tuning) to produce low-precision ViTs. The GELU and softmax blocks yield 56.3% and 22.6% reductions in mean absolute error and achieve 5.29× and 12.6× reductions in area-delay product, respectively, with substantial accuracy gains over baseline low-precision ViTs on CIFAR10/100. This work offers a practical path to accurate, efficient ViT acceleration using end-to-end SC and a flexible design space for hardware accuracy-efficiency trade-offs.

Abstract

Stochastic computing (SC) has emerged as a promising computing paradigm for neural acceleration. However, how to accelerate the state-of-the-art Vision Transformer (ViT) with SC remains unclear. Unlike convolutional neural networks, ViTs introduce notable compatibility and efficiency challenges because of their nonlinear functions, e.g., softmax and Gaussian Error Linear Units (GELU). In this paper, for the first time, a ViT accelerator based on end-to-end SC, dubbed ASCEND, is proposed. ASCEND co-designs the SC circuits and ViT networks to enable accurate yet efficient acceleration. To overcome the compatibility challenges, ASCEND proposes a novel deterministic SC block for GELU and leverages an SC-friendly iterative approximate algorithm to design an accurate and efficient softmax circuit. To improve inference efficiency, ASCEND develops a two-stage training pipeline to produce accurate low-precision ViTs. With extensive experiments, we show the proposed GELU and softmax blocks achieve 56.3% and 22.6% error reduction compared to existing SC designs, respectively and reduce the area-delay product (ADP) by 5.29x and 12.6x, respectively. Moreover, compared to the baseline low-precision ViTs, ASCEND also achieves significant accuracy improvements on CIFAR10 and CIFAR100.
Paper Structure (19 sections, 2 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 2 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: The diagram of a transformer encoder block.
  • Figure 2: GELU by (a) FSM-based design, (b) 4-term Bernstein polynomial , (c) naive SI-based design, and (d) the proposed gate-assisted SI design.
  • Figure 3: The circuit/network co-design of the proposed ASCEND.
  • Figure 4: (a) Gate-assisted SI implements non-monotonic functions with the help of simple combinational logics. (b) Ternary GELU implemented by (a).
  • Figure 5: SC circuit block of Iterative Approximate Softmax.
  • ...and 3 more figures