Table of Contents
Fetching ...

Retrospective: A CORDIC Based Configurable Activation Function for NN Applications

Omkar Kokane, Gopal Raut, Salim Ullah, Mukul Lokhande, Adam Teman, Akash Kumar, Santosh Kumar Vishvakarma

TL;DR

The paper addresses the activation function bottleneck in AI accelerators by proposing a dynamically configurable, CORDIC-based AF core (DA-VINCI). It introduces the NEURIC neuron engine built atop a reconfigurable Shift-and-Add CORDIC MAC+AF unit, enabling runtime switching among Swish, SoftMax, SELU, GELU, Sigmoid, Tanh, and ReLU with minimal overhead. Hardware evaluations across SystemVerilog implementations, FPGA prototypes, and ASIC-level analyses demonstrate a QoR of $98.5\%$ and substantial gains in area, delay, and energy efficiency, validating the approach for resource-constrained DNNs, RNNs/LSTMs, and Transformers. Collectively, the work offers a practical path to high-density, precision-scalable AI accelerators that maintain accuracy while reducing dark-silicon costs and power in edge and embedded contexts.

Abstract

A CORDIC-based configuration for the design of Activation Functions (AF) was previously suggested to accelerate ASIC hardware design for resource-constrained systems by providing functional reconfigurability. Since its introduction, this new approach for neural network acceleration has gained widespread popularity, influencing numerous designs for activation functions in both academic and commercial AI processors. In this retrospective analysis, we explore the foundational aspects of this initiative, summarize key developments over recent years, and introduce the DA-VINCI AF tailored for the evolving needs of AI applications. This new generation of dynamically configurable and precision-adjustable activation function cores promise greater adaptability for a range of activation functions in AI workloads, including Swish, SoftMax, SeLU, and GeLU, utilizing the Shift-and-Add CORDIC technique. The previously presented design has been optimized for MAC, Sigmoid, and Tanh functionalities and incorporated into ReLU AFs, culminating in an accumulative NEURIC compute unit. These enhancements position NEURIC as a fundamental component in the resource-efficient vector engine for the realization of AI accelerators that focus on DNNs, RNNs/LSTMs, and Transformers, achieving a quality of results (QoR) of 98.5%.

Retrospective: A CORDIC Based Configurable Activation Function for NN Applications

TL;DR

The paper addresses the activation function bottleneck in AI accelerators by proposing a dynamically configurable, CORDIC-based AF core (DA-VINCI). It introduces the NEURIC neuron engine built atop a reconfigurable Shift-and-Add CORDIC MAC+AF unit, enabling runtime switching among Swish, SoftMax, SELU, GELU, Sigmoid, Tanh, and ReLU with minimal overhead. Hardware evaluations across SystemVerilog implementations, FPGA prototypes, and ASIC-level analyses demonstrate a QoR of and substantial gains in area, delay, and energy efficiency, validating the approach for resource-constrained DNNs, RNNs/LSTMs, and Transformers. Collectively, the work offers a practical path to high-density, precision-scalable AI accelerators that maintain accuracy while reducing dark-silicon costs and power in edge and embedded contexts.

Abstract

A CORDIC-based configuration for the design of Activation Functions (AF) was previously suggested to accelerate ASIC hardware design for resource-constrained systems by providing functional reconfigurability. Since its introduction, this new approach for neural network acceleration has gained widespread popularity, influencing numerous designs for activation functions in both academic and commercial AI processors. In this retrospective analysis, we explore the foundational aspects of this initiative, summarize key developments over recent years, and introduce the DA-VINCI AF tailored for the evolving needs of AI applications. This new generation of dynamically configurable and precision-adjustable activation function cores promise greater adaptability for a range of activation functions in AI workloads, including Swish, SoftMax, SeLU, and GeLU, utilizing the Shift-and-Add CORDIC technique. The previously presented design has been optimized for MAC, Sigmoid, and Tanh functionalities and incorporated into ReLU AFs, culminating in an accumulative NEURIC compute unit. These enhancements position NEURIC as a fundamental component in the resource-efficient vector engine for the realization of AI accelerators that focus on DNNs, RNNs/LSTMs, and Transformers, achieving a quality of results (QoR) of 98.5%.

Paper Structure

This paper contains 7 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The detailed micro-architecture of DA-VINCI core with integrated data flow and control signals
  • Figure 2: Comparison of error estimation (compared to baseline FP32) with state-of-the-art works Designspaceexploration-AFRECONReAFM-NNCORDICAF-LSTMSoftAct-Trans.
  • Figure 3: Performance analysis (ASIC: Energy Efficiency vs Compute Density) with state-of-the-art works TCASI23-SoftmaxRECONSoftMax-taylor-DNN.
  • Figure 4: Performance-Enhanced Dynamically Configurable Layer Multiplexed Vector Engine for AI Acceleration.
  • Figure 5: Comparison of AI application accuracy evaluation with state-of-the-art works Designspaceexploration-AFRECONSoftMax-taylor-DNNSoftAct-TransReAFM-NNCORDICAF-LSTM.