Table of Contents
Fetching ...

Precision Neural Network Quantization via Learnable Adaptive Modules

Wenqiang Zhou, Zhendong Yu, Xinyu Liu, Jiaming Yang, Rong Xiao, Tao Wang, Chenwei Tang, Jiancheng Lv

TL;DR

This paper tackles the accuracy degradation caused by low-bit quantization by introducing Adaptive Step Size Quantization (ASQ) for activations and a Power Of Square root of Two (POST) non-uniform scheme for weights. ASQ uses a lightweight adapter to dynamically adjust the activation quantization step size, aligning quantization to varying activation distributions during training, while POST mitigates the rigid resolution of POT and remains hardware-friendly via a LUT. Through extensive experiments on ImageNet and CIFAR-10 with ResNet and MobileNet-V2, ASQ consistently outperforms state-of-the-art QAT methods and, in some cases, matches or slightly surpasses full-precision baselines at 4-bit precision. The combination of ASQ and POST delivers both higher accuracy and practical inference efficiency, supported by ablations and visual analyses that underscore the value of distribution-aware activation and weight quantization.

Abstract

Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake quantization operators during the training process, allowing the model to autonomously compensate for information loss caused by quantization. Making quantization parameters trainable can significantly improve the performance of QAT, but at the cost of compromising the flexibility during inference, especially when dealing with activation values with substantially different distributions. In this paper, we propose an effective learnable adaptive neural network quantization method, called Adaptive Step Size Quantization (ASQ), to resolve this conflict. Specifically, the proposed ASQ method first dynamically adjusts quantization scaling factors through a trained module capable of accommodating different activations. Then, to address the rigid resolution issue inherent in Power of Two (POT) quantization, we propose an efficient non-uniform quantization scheme. We utilize the Power Of Square root of Two (POST) as the basis for exponential quantization, effectively handling the bell-shaped distribution of neural network weights across various bit-widths while maintaining computational efficiency through a Look-Up Table method (LUT). Extensive experimental results demonstrate that the proposed ASQ method is superior to the state-of-the-art QAT approaches. Notably that the ASQ is even competitive compared to full precision baselines, with its 4-bit quantized ResNet34 model improving accuracy by 1.2\% on ImageNet.

Precision Neural Network Quantization via Learnable Adaptive Modules

TL;DR

This paper tackles the accuracy degradation caused by low-bit quantization by introducing Adaptive Step Size Quantization (ASQ) for activations and a Power Of Square root of Two (POST) non-uniform scheme for weights. ASQ uses a lightweight adapter to dynamically adjust the activation quantization step size, aligning quantization to varying activation distributions during training, while POST mitigates the rigid resolution of POT and remains hardware-friendly via a LUT. Through extensive experiments on ImageNet and CIFAR-10 with ResNet and MobileNet-V2, ASQ consistently outperforms state-of-the-art QAT methods and, in some cases, matches or slightly surpasses full-precision baselines at 4-bit precision. The combination of ASQ and POST delivers both higher accuracy and practical inference efficiency, supported by ablations and visual analyses that underscore the value of distribution-aware activation and weight quantization.

Abstract

Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake quantization operators during the training process, allowing the model to autonomously compensate for information loss caused by quantization. Making quantization parameters trainable can significantly improve the performance of QAT, but at the cost of compromising the flexibility during inference, especially when dealing with activation values with substantially different distributions. In this paper, we propose an effective learnable adaptive neural network quantization method, called Adaptive Step Size Quantization (ASQ), to resolve this conflict. Specifically, the proposed ASQ method first dynamically adjusts quantization scaling factors through a trained module capable of accommodating different activations. Then, to address the rigid resolution issue inherent in Power of Two (POT) quantization, we propose an efficient non-uniform quantization scheme. We utilize the Power Of Square root of Two (POST) as the basis for exponential quantization, effectively handling the bell-shaped distribution of neural network weights across various bit-widths while maintaining computational efficiency through a Look-Up Table method (LUT). Extensive experimental results demonstrate that the proposed ASQ method is superior to the state-of-the-art QAT approaches. Notably that the ASQ is even competitive compared to full precision baselines, with its 4-bit quantized ResNet34 model improving accuracy by 1.2\% on ImageNet.

Paper Structure

This paper contains 23 sections, 15 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) The data distributions of different images (model inputs) exhibit significant variations. (b) The distribution of weights in a convolutional layer or linear layer of a deep neural network generally exhibits a bell-shaped curve centered around a mean close to zero.
  • Figure 2: An overview of the proposed method. We optimized quantization for both activations and weights. For activations, a two-layer linear adapter produces an factor $\beta$, which multiplies a trainable parameter $s$ to dynamically adjust the quantization step size. For weights, POST quantization addresses the rigid resolution issue and better fits the bell-shaped distribution.
  • Figure 3: The Comparison of 4-bit quantization levels among Uniform, POT, and POST quantization.
  • Figure 4: Compare the histograms of the activation distribution before and after quantization using ASQ and LSQ for 2-bit and 3-bit quantization. $\beta$ is the output of the adaptive module used to adjust the quantization step size.
  • Figure 5: The output error (error accumulation) in blocks 2-9 of ResNet20 using ASQ and LSQ quantizers respectively.
  • ...and 1 more figures