Table of Contents
Fetching ...

Efficient yet Accurate End-to-End SC Accelerator Design

Meng Li, Yixuan Hu, Tengyu Zhang, Renjie Wei, Yawen Zhang, Ru Huang, Runsheng Wang

TL;DR

The paper addresses the challenge of delivering accurate yet efficient end-to-end stochastic computing (SC) neural acceleration for state-of-the-art models. It combines deterministic thermometer coding with a bitonic sorting network to achieve exact end-to-end SC NN acceleration, and introduces SC-friendly model designs with high-precision residual fusion to compensate for low-precision activations. It further advances hardware efficiency with approximate spatial-temporal BSN designs that reduce area-delay product while preserving accuracy across layers, enabling flexible adaptation to modern architectures. The silicon-proven accelerator achieves high energy efficiency and fault tolerance, and the work lays groundwork for future end-to-end SC accelerators, including transformer-focused implementations.

Abstract

Providing end-to-end stochastic computing (SC) neural network acceleration for state-of-the-art (SOTA) models has become an increasingly challenging task, requiring the pursuit of accuracy while maintaining efficiency. It also necessitates flexible support for different types and sizes of operations in models by end-to-end SC circuits. In this paper, we summarize our recent research on end-to-end SC neural network acceleration. We introduce an accurate end-to-end SC accelerator based on a deterministic coding and sorting network. In addition, we propose an SC-friendly model that combines low-precision data paths with high-precision residuals. We introduce approximate computing techniques to optimize SC nonlinear adders and provide some new SC designs for arithmetic operations required by SOTA models. Overall, our approach allows for further significant improvements in circuit efficiency, flexibility, and compatibility through circuit design and model co-optimization. The results demonstrate that the proposed end-to-end SC architecture achieves accurate and efficient neural network acceleration while flexibly accommodating model requirements, showcasing the potential of SC in neural network acceleration.

Efficient yet Accurate End-to-End SC Accelerator Design

TL;DR

The paper addresses the challenge of delivering accurate yet efficient end-to-end stochastic computing (SC) neural acceleration for state-of-the-art models. It combines deterministic thermometer coding with a bitonic sorting network to achieve exact end-to-end SC NN acceleration, and introduces SC-friendly model designs with high-precision residual fusion to compensate for low-precision activations. It further advances hardware efficiency with approximate spatial-temporal BSN designs that reduce area-delay product while preserving accuracy across layers, enabling flexible adaptation to modern architectures. The silicon-proven accelerator achieves high energy efficiency and fault tolerance, and the work lays groundwork for future end-to-end SC accelerators, including transformer-focused implementations.

Abstract

Providing end-to-end stochastic computing (SC) neural network acceleration for state-of-the-art (SOTA) models has become an increasingly challenging task, requiring the pursuit of accuracy while maintaining efficiency. It also necessitates flexible support for different types and sizes of operations in models by end-to-end SC circuits. In this paper, we summarize our recent research on end-to-end SC neural network acceleration. We introduce an accurate end-to-end SC accelerator based on a deterministic coding and sorting network. In addition, we propose an SC-friendly model that combines low-precision data paths with high-precision residuals. We introduce approximate computing techniques to optimize SC nonlinear adders and provide some new SC designs for arithmetic operations required by SOTA models. Overall, our approach allows for further significant improvements in circuit efficiency, flexibility, and compatibility through circuit design and model co-optimization. The results demonstrate that the proposed end-to-end SC architecture achieves accurate and efficient neural network acceleration while flexibly accommodating model requirements, showcasing the potential of SC in neural network acceleration.
Paper Structure (15 sections, 1 equation, 13 figures, 5 tables)

This paper contains 15 sections, 1 equation, 13 figures, 5 tables.

Figures (13)

  • Figure 1: FSM-based design to implement (a) tanh and (b) ReLU. Ideally, the circuit output is the same as the exact output, marked by the red line.
  • Figure 2: The trade-off between inference accuracy and efficiency (measured by area-delay product, i.e., ADP). Here, we fix the weight BSL to 2-bit and sweep the activation BSL.
  • Figure 3: (a) The truth table and circuit of ternary SC multiplier. (b) The BSN and the selective interconnect system for accumulation and activation function.
  • Figure 4: (a) Current and (b) energy efficiency versus supply voltage at different working frequencies.
  • Figure 5: Accuracy loss of the conventional binary design and proposed SC design versus bit error rate, at the soft accuracy of 98.28%.
  • ...and 8 more figures