Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

Xincheng Feng; Guodong Shen; Jianhao Hu; Meng Li; Ngai Wong

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

Xincheng Feng, Guodong Shen, Jianhao Hu, Meng Li, Ngai Wong

TL;DR

This work addresses the hardware burden of nonlinear function computations in AI by introducing SMURF, a stochastic multivariate universal-radix FSM that uses stochastic computing to approximate multivariate nonlinear functions with low area and power. It derives steady-state probabilities and convex-optimization-based weight tuning for univariate and multivariate targets, and demonstrates accurate approximations of functions such as Euclidean distance, Hartley transform, and softmax, as well as their integration into a CNN. Across software and FPGA benchmarks, SMURF achieves comparable accuracy to conventional methods while reducing area to about $16.07\%$ and power to about $14.45\%$ of Taylor-series, and to $2.22\%$ of LUT-based schemes, highlighting strong potential for energy-efficient edge AI. The results substantiate SMURF as a versatile, hardware-friendly nonlinear function engine capable of handling multiple outputs from a single architecture with configurable parameters.

Abstract

Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes.

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

TL;DR

and power to about

of Taylor-series, and to

of LUT-based schemes, highlighting strong potential for energy-efficient edge AI. The results substantiate SMURF as a versatile, hardware-friendly nonlinear function engine capable of handling multiple outputs from a single architecture with configurable parameters.

Abstract

Paper Structure (19 sections, 22 equations, 10 figures, 6 tables)

This paper contains 19 sections, 22 equations, 10 figures, 6 tables.

Introduction
Preliminaries
Stochastic computing (SC)
Random sampling gates
FSM-based nonlinear function generator
SMURF Implementation
Architecture
Bivariate SMURF
Example 1
Example 2
Multivariate SMURF
Example 1
Discussion
Experimental results
Performance of SMURF in generic nonlinearities
...and 4 more sections

Figures (10)

Figure 1: The architecture of a stochastic number generator (SNG).
Figure 2: Stochastic multiplication and addition.
Figure 3: Mapping the variables of a function to the spatial domain.
Figure 4: The architecture of a chained $N$-state FSM where $x_b$ denotes the current bitstream (binary) value from a $\theta$-gate.
Figure 5: (a)-(d) The steady-state probabilities of 2-, 3-, 4-and 5-state FSMs.
...and 5 more figures

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

TL;DR

Abstract

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

Authors

TL;DR

Abstract

Table of Contents

Figures (10)