HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs

Samira Nazari; Mohammad Saeed Almasi; Mahdi Taheri; Ali Azarpeyvand; Ali Mokhtari; Ali Mahani; Christian Herglotz

HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs

Samira Nazari, Mohammad Saeed Almasi, Mahdi Taheri, Ali Azarpeyvand, Ali Mokhtari, Ali Mahani, Christian Herglotz

TL;DR

HAWX tackles the challenge of exploring heterogeneous approximate components in DNN accelerators by using a hardware-aware, multi-level sensitivity scoring framework. It profiles DNNs to compute hawx scores at operation, filter, layer, and model levels and couples them with predictive accuracy, power, and area models to prune the design space and generate Pareto-optimal configurations for both spatial and temporal accelerators. The results show dramatic speedups over exhaustive search (e.g., up to $23\times$ at layer level and over $10^{6}\times$ at filter level for LeNet-5, and up to $10^{4000}\times$ for EfficientLiteNet) while maintaining accuracy comparable to exhaustive search, with benefits that scale with network size. The work demonstrates practical hardware-aware AxC design for edge AI on diverse accelerators.

Abstract

This work presents HAWX, a hardware-aware scalable exploration framework that employs multi-level sensitivity scoring at different DNN abstraction levels (operator, filter, layer, and model) to guide selective integration of heterogeneous AxC blocks. Supported by predictive models for accuracy, power, and area, HAWX accelerates the evaluation of candidate configurations, achieving over 23* speedup in a layer-level search with two candidate approximate blocks and more than (3*106)* speedup at the filter-level search only for LeNet-5, while maintaining accuracy comparable to exhaustive search. Experiments across state-of-the-art DNN benchmarks such as VGG-11, ResNet-18, and EfficientNetLite demonstrate that the efficiency benefits of HAWX scale exponentially with network size. The HAWX hardware-aware search algorithm supports both spatial and temporal accelerator architectures, leveraging either off-the-shelf approximate components or customized designs.

HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs

TL;DR

at layer level and over

at filter level for LeNet-5, and up to

for EfficientLiteNet) while maintaining accuracy comparable to exhaustive search, with benefits that scale with network size. The work demonstrates practical hardware-aware AxC design for edge AI on diverse accelerators.

Abstract

Paper Structure (11 sections, 9 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 9 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Proposed Methodology
Approximation Scoring
HAWX-based Design Space Exploration
Hardware-Aware Approximate Configurations
Experimental Results
Experimental Setup
Results and Discussion
Search Runtime Comparison
Conclusion
Acknowledgements

Figures (5)

Figure 1: A high-level flow of the proposed methodology
Figure 2: Activation distribution of LeNet-5 on representative inputs
Figure 3: hawx(o) distribution in LeNet-5
Figure 4: Comparison of Accuracy (blue) and $hawx(model)$ (red) across DNNs.
Figure 5: Trade-off between accuracy, power, and area for dataflow and FGPU architectures.

HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs

TL;DR

Abstract

HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)