Table of Contents
Fetching ...

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

Vishal Shashidhar, Anupam Kumari, Roy P Paily

TL;DR

This work proposes a ``soft sparsity'' paradigm using a hardware efficient Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications, which significantly optimizes resource-constrained inference.

Abstract

Modern CNNs' high computational demands hinder edge deployment, as traditional ``hard'' sparsity (skipping mathematical zeros) loses effectiveness in deep layers or with smooth activations like Tanh. We propose a ``soft sparsity'' paradigm using a hardware efficient Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications. Integrated as a custom RISC-V instruction and evaluated on LeNet-5 (MNIST), this method reduces ReLU MACs by 88.42% and Tanh MACs by 74.87% with zero accuracy loss--outperforming zero-skipping by 5x. By clock-gating inactive multipliers, we estimate power savings of 35.2\% for ReLU and 29.96\% for Tanh. While memory access makes power reduction sub-linear to operation savings, this approach significantly optimizes resource-constrained inference.

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

TL;DR

This work proposes a ``soft sparsity'' paradigm using a hardware efficient Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications, which significantly optimizes resource-constrained inference.

Abstract

Modern CNNs' high computational demands hinder edge deployment, as traditional ``hard'' sparsity (skipping mathematical zeros) loses effectiveness in deep layers or with smooth activations like Tanh. We propose a ``soft sparsity'' paradigm using a hardware efficient Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications. Integrated as a custom RISC-V instruction and evaluated on LeNet-5 (MNIST), this method reduces ReLU MACs by 88.42% and Tanh MACs by 74.87% with zero accuracy loss--outperforming zero-skipping by 5x. By clock-gating inactive multipliers, we estimate power savings of 35.2\% for ReLU and 29.96\% for Tanh. While memory access makes power reduction sub-linear to operation savings, this approach significantly optimizes resource-constrained inference.
Paper Structure (12 sections, 9 equations, 5 figures, 6 tables)

This paper contains 12 sections, 9 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Left and right side show subsequent feature maps with Tanh and ReLU activation respectively.
  • Figure 2: Visual demonstration of outputs with different error thresholds.
  • Figure 3: Distribution of fractional pixel errors for each of the four error thresholds. top left(T = 0.03),top right(T = 0.06), bottom left(T = 0.1), bottom right(T = 0.25).
  • Figure 4: At T=0.3, 11.58% of total MACs preserve accuracy.
  • Figure 5: At T=0.2, 25.13% of total MACs preserve accuracy.