Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Akshat Ramachandran; Zishen Wan; Geonhwa Jeong; John Gustafson; Tushar Krishna

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Akshat Ramachandran, Zishen Wan, Geonhwa Jeong, John Gustafson, Tushar Krishna

TL;DR

Logarithmic Posits is introduced, an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields, and a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters.

Abstract

Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while reducing representational divergence between quantized and full-precision models through a novel global-local contrastive objective. Additionally, we design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath. Our algorithm-hardware co-design demonstrates on average <1% drop in top-1 accuracy across various CNN and ViT models. It also achieves ~ 2x improvements in performance per unit area and 2.2x gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types.

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

TL;DR

Abstract

Paper Structure (13 sections, 3 equations, 6 figures, 4 tables)

This paper contains 13 sections, 3 equations, 6 figures, 4 tables.

Introduction
Background and related work
LP: Logarithmic Posits
LPQ: LP Quantization Framework
Fitness Function
LPA: LP-based DNN accelerator
Architecture Overview
PE Architecture
Evaluation
Effectiveness of LPQ
Effectiveness of LPA
Conclusion
Acknowledgements

Figures (6)

Figure 1: (a) Weight distributions of ResNet50 and ViT (De: Decoder, En: Encoder) layers, (b) LP's relative-accuracy plot, showing distribution-aware properties compared to AF tambe2020algorithm.
Figure 2: Overview of LPQ Framework illustrating the four major steps and evaluation of fitness function.
Figure 3: LPA Architecture depicting detailed LP PE and Unified LP Decoder units.
Figure 4: Architecture of mixed-precision 2's complementer (a) and Leading Zero Detector (LZD) (b).
Figure 5: (a) LPQ performance with various loss functions, (b) RMSE distribution of quantization error of different formats.
...and 1 more figures

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

TL;DR

Abstract

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (6)