Efficient yet Accurate End-to-End SC Accelerator Design
Meng Li, Yixuan Hu, Tengyu Zhang, Renjie Wei, Yawen Zhang, Ru Huang, Runsheng Wang
TL;DR
The paper addresses the challenge of delivering accurate yet efficient end-to-end stochastic computing (SC) neural acceleration for state-of-the-art models. It combines deterministic thermometer coding with a bitonic sorting network to achieve exact end-to-end SC NN acceleration, and introduces SC-friendly model designs with high-precision residual fusion to compensate for low-precision activations. It further advances hardware efficiency with approximate spatial-temporal BSN designs that reduce area-delay product while preserving accuracy across layers, enabling flexible adaptation to modern architectures. The silicon-proven accelerator achieves high energy efficiency and fault tolerance, and the work lays groundwork for future end-to-end SC accelerators, including transformer-focused implementations.
Abstract
Providing end-to-end stochastic computing (SC) neural network acceleration for state-of-the-art (SOTA) models has become an increasingly challenging task, requiring the pursuit of accuracy while maintaining efficiency. It also necessitates flexible support for different types and sizes of operations in models by end-to-end SC circuits. In this paper, we summarize our recent research on end-to-end SC neural network acceleration. We introduce an accurate end-to-end SC accelerator based on a deterministic coding and sorting network. In addition, we propose an SC-friendly model that combines low-precision data paths with high-precision residuals. We introduce approximate computing techniques to optimize SC nonlinear adders and provide some new SC designs for arithmetic operations required by SOTA models. Overall, our approach allows for further significant improvements in circuit efficiency, flexibility, and compatibility through circuit design and model co-optimization. The results demonstrate that the proposed end-to-end SC architecture achieves accurate and efficient neural network acceleration while flexibly accommodating model requirements, showcasing the potential of SC in neural network acceleration.
