CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications

Sonu Kumar; Mohd Faisal Khan; Mukul Lokhande; Santosh Kumar Vishvakarma

CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications

Sonu Kumar, Mohd Faisal Khan, Mukul Lokhande, Santosh Kumar Vishvakarma

TL;DR

A runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration that enables dynamic reconfiguration between approximate and accurate modes, exploiting the latency-accuracy trade-off for a wide range of workloads.

Abstract

This brief presents a runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration. The proposed design enables dynamic reconfiguration between approximate and accurate modes, exploiting the latency-accuracy trade-off for a wide range of workloads. Its resource-efficient approach further enables up to 4x throughput improvement within the same hardware resources by leveraging vectorised, time-multiplexed execution and flexible precision scaling. With a time-multiplexed multi-AF block and a lightweight pooling and normalisation unit, the proposed vector engine supports flexible precision (4/8/16-bit) and high MAC density. The ASIC implementation results show that each MAC stage can save up to 33% of time and 21% of power, with a 256-PE configuration that achieves higher compute density (4.83 TOPS/mm2 ) and energy efficiency (11.67 TOPS/W) than previous state-of-the-art work. A detailed hardware-software co-design methodology for object detection and classification tasks on Pynq-Z2 is discussed to assess the proposed architecture, demonstrating a scalable, energy-efficient solution for edge AI applications.

CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications

TL;DR

Abstract

Paper Structure (28 sections, 5 equations, 13 figures, 5 tables)

This paper contains 28 sections, 5 equations, 13 figures, 5 tables.

Introduction
Architecture Overview
Vector Engine Organization
Runtime Accuracy and Precision Adaptation
Control Engine and Data Flow
Memory mapping
Time-Multiplexed Multi-Activation-Function Integration
Scalability and System Integration
Circuit Implementation
Runtime-Adaptive Iterative CORDIC-Based MAC
Latency Hiding Through Vector-Level Parallelism
Absolute Average Deviation (AAD) Pooling Block
Time-Multiplexed Multi-Activation-Function Block
Peripheral Support and Integration
Experimental Methodology
...and 13 more sections

Figures (13)

Figure 1: Block-level architecture of the proposed CORDIC-based vector engine integrated within a resource-efficient deep learning accelerator.
Figure 2: Control Engine for efficient reuse of data and control signals in a layer-multiplexed for reusing the same DNN architecture.
Figure 3: DNN accelerator data flow and order to initialise the loading data.
Figure 4: Memory mapping scheme for address bits that requires addressing weights and bias for the individual neurons.
Figure 5: Iterative low-latency CORDIC-based MAC architecture with runtime-configurable iteration depth.
...and 8 more figures

CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications

TL;DR

Abstract

CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications

Authors

TL;DR

Abstract

Table of Contents

Figures (13)