Table of Contents
Fetching ...

HYDRA: Hybrid Data Multiplexing and Run-time Layer Configurable DNN Accelerator

Sonu Kumar, Komal Gupta, Gopal Raut, Mukul Lokhande, Santosh Kumar Vishvakarma

TL;DR

HYDRA addresses edge DNN deployment challenges by introducing a layer-multiplexed accelerator that reuses the same hardware to execute networks of varying depth. It couples a $1$-D array of FMA units with a runtime layer-configurable design and a single activation function accessed via a parallel-in-serial-out path (PISO), enabling $L$-layer networks with reduced area and power. Experimental results show reductions of over 90% in power and resource usage relative to state-of-the-art designs, achieving $35.21$ TOPS/W at 100 MHz for a $64:32:32:10$ network. The work demonstrates practical edge deployment potential for DNNs with scalable hardware reuse and configurability, enabling efficient MNIST/CIFAR-10 style workloads.

Abstract

Deep neural networks (DNNs) offer plenty of challenges in executing efficient computation at edge nodes, primarily due to the huge hardware resource demands. The article proposes HYDRA, hybrid data multiplexing, and runtime layer configurable DNN accelerators to overcome the drawbacks. The work proposes a layer-multiplexed approach, which further reuses a single activation function within the execution of a single layer with improved Fused-Multiply-Accumulate (FMA). The proposed approach works in iterative mode to reuse the same hardware and execute different layers in a configurable fashion. The proposed architectures achieve reductions over 90% of power consumption and resource utilization improvements of state-of-the-art works, with 35.21 TOPSW. The proposed architecture reduces the area overhead (N-1) times required in bandwidth, AF and layer architecture. This work shows HYDRA architecture supports optimal DNN computations while improving performance on resource-constrained edge devices.

HYDRA: Hybrid Data Multiplexing and Run-time Layer Configurable DNN Accelerator

TL;DR

HYDRA addresses edge DNN deployment challenges by introducing a layer-multiplexed accelerator that reuses the same hardware to execute networks of varying depth. It couples a -D array of FMA units with a runtime layer-configurable design and a single activation function accessed via a parallel-in-serial-out path (PISO), enabling -layer networks with reduced area and power. Experimental results show reductions of over 90% in power and resource usage relative to state-of-the-art designs, achieving TOPS/W at 100 MHz for a network. The work demonstrates practical edge deployment potential for DNNs with scalable hardware reuse and configurability, enabling efficient MNIST/CIFAR-10 style workloads.

Abstract

Deep neural networks (DNNs) offer plenty of challenges in executing efficient computation at edge nodes, primarily due to the huge hardware resource demands. The article proposes HYDRA, hybrid data multiplexing, and runtime layer configurable DNN accelerators to overcome the drawbacks. The work proposes a layer-multiplexed approach, which further reuses a single activation function within the execution of a single layer with improved Fused-Multiply-Accumulate (FMA). The proposed approach works in iterative mode to reuse the same hardware and execute different layers in a configurable fashion. The proposed architectures achieve reductions over 90% of power consumption and resource utilization improvements of state-of-the-art works, with 35.21 TOPSW. The proposed architecture reduces the area overhead (N-1) times required in bandwidth, AF and layer architecture. This work shows HYDRA architecture supports optimal DNN computations while improving performance on resource-constrained edge devices.
Paper Structure (7 sections, 2 equations, 4 figures, 4 tables)

This paper contains 7 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: DNN architecture with Conv. and FC. Layers. FMA computation is performed on ifmaps and Kernels.
  • Figure 2: Runtime configurable proposed layer-reused HYDRA architecture, with subunits FMA and AF reused Layer.
  • Figure 3: Finite State-machine for the proposed FMA, followed by PISO and the single activation function to be reused with control signals.
  • Figure 4: Comparison of (a) Power-Area-Critical Path Delay with State-of-the-Art 8-bit FMA unitsref1ref2ref6ref4ref5, (b) Quantisation Impact on Classification Accuracy.