ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

Behnam Ghavami; Mohammad Shahidzadeh; Lesley Shannon; Steve Wilton

ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

Behnam Ghavami, Mohammad Shahidzadeh, Lesley Shannon, Steve Wilton

TL;DR

The paper tackles reliability challenges of deep neural networks on edge hardware under memory faults, proposing a zero-overhead, quantization-based fault-tolerance approach. It introduces ZOBNN, a selectively quantized Binary Neural Network that bounds floating-point parameter ranges by reworking quantization/dequantization layers to avoid extra compute. Fault-injection experiments on FracBNN and DoReFaNet on CIFAR-10 and ImageNet show about a 5X improvement in robustness over conventional floating-point networks, with notable accuracy retention at fault rates up to $1\times 10^{-4}$ and tangible memory savings. The approach yields significant memory reductions (up to 34% with 12-bit quantization on ImageNet) while incurring no runtime overhead, making it well-suited for dependable, real-time edge AI applications.

Abstract

Low-precision weights and activations in deep neural networks (DNNs) outperform their full-precision counterparts in terms of hardware efficiency. When implemented with low-precision operations, specifically in the extreme case where network parameters are binarized (i.e. BNNs), the two most frequently mentioned benefits of quantization are reduced memory consumption and a faster inference process. In this paper, we introduce a third advantage of very low-precision neural networks: improved fault-tolerance attribute. We investigate the impact of memory faults on state-of-the-art binary neural networks (BNNs) through comprehensive analysis. Despite the inclusion of floating-point parameters in BNN architectures to improve accuracy, our findings reveal that BNNs are highly sensitive to deviations in these parameters caused by memory faults. In light of this crucial finding, we propose a technique to improve BNN dependability by restricting the range of float parameters through a novel deliberately uniform quantization. The introduced quantization technique results in a reduction in the proportion of floating-point parameters utilized in the BNN, without incurring any additional computational overheads during the inference stage. The extensive experimental fault simulation on the proposed BNN architecture (i.e. ZOBNN) reveal a remarkable 5X enhancement in robustness compared to conventional floating-point DNN. Notably, this improvement is achieved without incurring any computational overhead. Crucially, this enhancement comes without computational overhead. \ToolName~excels in critical edge applications characterized by limited computational resources, prioritizing both dependability and real-time performance.

ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

TL;DR

and tangible memory savings. The approach yields significant memory reductions (up to 34% with 12-bit quantization on ImageNet) while incurring no runtime overhead, making it well-suited for dependable, real-time edge AI applications.

Abstract

Paper Structure (24 sections, 16 equations, 4 figures, 2 tables)

This paper contains 24 sections, 16 equations, 4 figures, 2 tables.

Introduction
Related Work
Preliminaries and Definitions
Deep Neural Networks
Integer Only Quantization of Neural Networks
Binary Neural Networks
Motivation: Reliability analysis of Conventional BNNs
Proposed Selective Quantization of BNNs
Dequantization and Quantization Layers
Input Layer
Output Layer
Binarized Layer
Other Layers via SOTA BNN
batch-norm
rprelu
...and 9 more sections

Figures (4)

Figure 1: Comparing accuracy drop points for fault rates in float, quantized, and binary models using the fracbnnzhang2021fracbnn base architecture which further transformed to binary and quantized.
Figure 2: A comparison between different layers' accuracy drop-points under various fault rates in the fracbnnzhang2021fracbnn architecture.
Figure 3: Comprehensive fault study of float, baseline and proposed ZOBNN-based FracBNN zhang2021fracbnn and DoReFaNet zhou2016dorefa.
Figure 4: Comparison of accuracy distribution in FracBNN zhang2021fracbnn model to its selective quantized model under ZOBNN. The distribution was drawn from 500 different samples under three different fault rates.

ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

TL;DR

Abstract

ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

Authors

TL;DR

Abstract

Table of Contents

Figures (4)