Table of Contents
Fetching ...

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

Xincheng Feng, Guodong Shen, Jianhao Hu, Meng Li, Ngai Wong

TL;DR

This work addresses the hardware burden of nonlinear function computations in AI by introducing SMURF, a stochastic multivariate universal-radix FSM that uses stochastic computing to approximate multivariate nonlinear functions with low area and power. It derives steady-state probabilities and convex-optimization-based weight tuning for univariate and multivariate targets, and demonstrates accurate approximations of functions such as Euclidean distance, Hartley transform, and softmax, as well as their integration into a CNN. Across software and FPGA benchmarks, SMURF achieves comparable accuracy to conventional methods while reducing area to about $16.07\%$ and power to about $14.45\%$ of Taylor-series, and to $2.22\%$ of LUT-based schemes, highlighting strong potential for energy-efficient edge AI. The results substantiate SMURF as a versatile, hardware-friendly nonlinear function engine capable of handling multiple outputs from a single architecture with configurable parameters.

Abstract

Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes.

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

TL;DR

This work addresses the hardware burden of nonlinear function computations in AI by introducing SMURF, a stochastic multivariate universal-radix FSM that uses stochastic computing to approximate multivariate nonlinear functions with low area and power. It derives steady-state probabilities and convex-optimization-based weight tuning for univariate and multivariate targets, and demonstrates accurate approximations of functions such as Euclidean distance, Hartley transform, and softmax, as well as their integration into a CNN. Across software and FPGA benchmarks, SMURF achieves comparable accuracy to conventional methods while reducing area to about and power to about of Taylor-series, and to of LUT-based schemes, highlighting strong potential for energy-efficient edge AI. The results substantiate SMURF as a versatile, hardware-friendly nonlinear function engine capable of handling multiple outputs from a single architecture with configurable parameters.

Abstract

Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes.
Paper Structure (19 sections, 22 equations, 10 figures, 6 tables)

This paper contains 19 sections, 22 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The architecture of a stochastic number generator (SNG).
  • Figure 2: Stochastic multiplication and addition.
  • Figure 3: Mapping the variables of a function to the spatial domain.
  • Figure 4: The architecture of a chained $N$-state FSM where $x_b$ denotes the current bitstream (binary) value from a $\theta$-gate.
  • Figure 5: (a)-(d) The steady-state probabilities of 2-, 3-, 4-and 5-state FSMs.
  • ...and 5 more figures