Table of Contents
Fetching ...

Leveraging Highly Approximated Multipliers in DNN Inference

Georgios Zervakis, Fabio Frustaci, Ourania Spantidi, Iraklis Anagnostopoulos, Hussam Amrouch, Jörg Henkel

TL;DR

This paper tackles the challenge of powering DNN inference on edge devices by enabling highly approximate multipliers through a control variate mechanism that mitigates convolution errors at run-time without retraining. The method formalizes a correction term V added to the approximate convolution, achieving zero-mean error and reduced variance for perforated, recursive, and truncated multipliers, and extends to other estimators. Empirical results across six CNNs and four MAC configurations show near-baseline accuracy with significant power savings (up to ~46%) and modest hardware overhead, illustrating a practical path to energy-efficient edge AI. The approach broadens the design space for DNN accelerators by enabling aggressive approximate hardware while maintaining acceptable accuracy, with broad implications for low-power AI devices.

Abstract

In this work, we present a control variate approximation technique that enables the exploitation of highly approximate multipliers in Deep Neural Network (DNN) accelerators. Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications, improving the overall inference accuracy. As a result, our approach enables satisfying tight accuracy loss constraints while boosting the power savings. Our experimental evaluation, across six different DNNs and several approximate multipliers, demonstrates the versatility of our approach and shows that compared to the accurate design, our control variate approximation achieves the same performance, 45% power reduction, and less than 1% average accuracy loss. Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.

Leveraging Highly Approximated Multipliers in DNN Inference

TL;DR

This paper tackles the challenge of powering DNN inference on edge devices by enabling highly approximate multipliers through a control variate mechanism that mitigates convolution errors at run-time without retraining. The method formalizes a correction term V added to the approximate convolution, achieving zero-mean error and reduced variance for perforated, recursive, and truncated multipliers, and extends to other estimators. Empirical results across six CNNs and four MAC configurations show near-baseline accuracy with significant power savings (up to ~46%) and modest hardware overhead, illustrating a practical path to energy-efficient edge AI. The approach broadens the design space for DNN accelerators by enabling aggressive approximate hardware while maintaining acceptable accuracy, with broad implications for low-power AI devices.

Abstract

In this work, we present a control variate approximation technique that enables the exploitation of highly approximate multipliers in Deep Neural Network (DNN) accelerators. Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications, improving the overall inference accuracy. As a result, our approach enables satisfying tight accuracy loss constraints while boosting the power savings. Our experimental evaluation, across six different DNNs and several approximate multipliers, demonstrates the versatility of our approach and shows that compared to the accurate design, our control variate approximation achieves the same performance, 45% power reduction, and less than 1% average accuracy loss. Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.

Paper Structure

This paper contains 25 sections, 36 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Partial product reduction stages for: a) the accurate multiplier b) the approximate perforated multiplier with $m=3$ and $s=0$.
  • Figure 2: The principle of the Recursive Approximate Multiplier: composing a large multiplier by using smaller inaccurate building blocks
  • Figure 3: The truncated multiplier with $m = 7$
  • Figure 4: Weight distribution of randomly selected filters of various NNs. Four examples are depicted. Figure obtained from zervakis2021control.
  • Figure 5: The a) accurate systolic MAC array and b) MAC unit. Figure obtained from zervakis2021control.
  • ...and 5 more figures