Table of Contents
Fetching ...

ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

Behnam Ghavami, Mohammad Shahidzadeh, Lesley Shannon, Steve Wilton

TL;DR

The paper tackles reliability challenges of deep neural networks on edge hardware under memory faults, proposing a zero-overhead, quantization-based fault-tolerance approach. It introduces ZOBNN, a selectively quantized Binary Neural Network that bounds floating-point parameter ranges by reworking quantization/dequantization layers to avoid extra compute. Fault-injection experiments on FracBNN and DoReFaNet on CIFAR-10 and ImageNet show about a 5X improvement in robustness over conventional floating-point networks, with notable accuracy retention at fault rates up to $1\times 10^{-4}$ and tangible memory savings. The approach yields significant memory reductions (up to 34% with 12-bit quantization on ImageNet) while incurring no runtime overhead, making it well-suited for dependable, real-time edge AI applications.

Abstract

Low-precision weights and activations in deep neural networks (DNNs) outperform their full-precision counterparts in terms of hardware efficiency. When implemented with low-precision operations, specifically in the extreme case where network parameters are binarized (i.e. BNNs), the two most frequently mentioned benefits of quantization are reduced memory consumption and a faster inference process. In this paper, we introduce a third advantage of very low-precision neural networks: improved fault-tolerance attribute. We investigate the impact of memory faults on state-of-the-art binary neural networks (BNNs) through comprehensive analysis. Despite the inclusion of floating-point parameters in BNN architectures to improve accuracy, our findings reveal that BNNs are highly sensitive to deviations in these parameters caused by memory faults. In light of this crucial finding, we propose a technique to improve BNN dependability by restricting the range of float parameters through a novel deliberately uniform quantization. The introduced quantization technique results in a reduction in the proportion of floating-point parameters utilized in the BNN, without incurring any additional computational overheads during the inference stage. The extensive experimental fault simulation on the proposed BNN architecture (i.e. ZOBNN) reveal a remarkable 5X enhancement in robustness compared to conventional floating-point DNN. Notably, this improvement is achieved without incurring any computational overhead. Crucially, this enhancement comes without computational overhead. \ToolName~excels in critical edge applications characterized by limited computational resources, prioritizing both dependability and real-time performance.

ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

TL;DR

The paper tackles reliability challenges of deep neural networks on edge hardware under memory faults, proposing a zero-overhead, quantization-based fault-tolerance approach. It introduces ZOBNN, a selectively quantized Binary Neural Network that bounds floating-point parameter ranges by reworking quantization/dequantization layers to avoid extra compute. Fault-injection experiments on FracBNN and DoReFaNet on CIFAR-10 and ImageNet show about a 5X improvement in robustness over conventional floating-point networks, with notable accuracy retention at fault rates up to and tangible memory savings. The approach yields significant memory reductions (up to 34% with 12-bit quantization on ImageNet) while incurring no runtime overhead, making it well-suited for dependable, real-time edge AI applications.

Abstract

Low-precision weights and activations in deep neural networks (DNNs) outperform their full-precision counterparts in terms of hardware efficiency. When implemented with low-precision operations, specifically in the extreme case where network parameters are binarized (i.e. BNNs), the two most frequently mentioned benefits of quantization are reduced memory consumption and a faster inference process. In this paper, we introduce a third advantage of very low-precision neural networks: improved fault-tolerance attribute. We investigate the impact of memory faults on state-of-the-art binary neural networks (BNNs) through comprehensive analysis. Despite the inclusion of floating-point parameters in BNN architectures to improve accuracy, our findings reveal that BNNs are highly sensitive to deviations in these parameters caused by memory faults. In light of this crucial finding, we propose a technique to improve BNN dependability by restricting the range of float parameters through a novel deliberately uniform quantization. The introduced quantization technique results in a reduction in the proportion of floating-point parameters utilized in the BNN, without incurring any additional computational overheads during the inference stage. The extensive experimental fault simulation on the proposed BNN architecture (i.e. ZOBNN) reveal a remarkable 5X enhancement in robustness compared to conventional floating-point DNN. Notably, this improvement is achieved without incurring any computational overhead. Crucially, this enhancement comes without computational overhead. \ToolName~excels in critical edge applications characterized by limited computational resources, prioritizing both dependability and real-time performance.
Paper Structure (24 sections, 16 equations, 4 figures, 2 tables)

This paper contains 24 sections, 16 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparing accuracy drop points for fault rates in float, quantized, and binary models using the fracbnnzhang2021fracbnn base architecture which further transformed to binary and quantized.
  • Figure 2: A comparison between different layers' accuracy drop-points under various fault rates in the fracbnnzhang2021fracbnn architecture.
  • Figure 3: Comprehensive fault study of float, baseline and proposed ZOBNN-based FracBNN zhang2021fracbnn and DoReFaNet zhou2016dorefa.
  • Figure 4: Comparison of accuracy distribution in FracBNN zhang2021fracbnn model to its selective quantized model under ZOBNN. The distribution was drawn from 500 different samples under three different fault rates.