Table of Contents
Fetching ...

Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix

Junbiao Pang, Tianyang Cai

TL;DR

This work addresses the instability of Quantization-Aware Training (QAT) caused by unavoidable quantization errors that induce sharp loss landscapes. It proposes Feature-Perturbed Quantization (FPQ), which injects stochastic perturbations into layer inputs (via Stochastic Feature Perturbations) and employs Channel-wise Standardization Distillation to align feature distributions between quantized and full-precision models, thereby implicitly regularizing the Hessian and promoting flat minima. Theoretical and empirical analyses show that FPQ reduces the Hessian norm (e.g., $\,\operatorname{Tr}(\nabla_{\boldsymbol{w}}^2 L)$) and outperforms state-of-the-art QAT methods and FP baselines across CIFAR-10/100 on multiple architectures, with ablations confirming the additive benefits of perturbations and distillation. The results suggest FPQ offers a practical, architecture-agnostic improvement for stable, high-accuracy quantized models, advancing the deployment of efficient neural networks on edge devices.

Abstract

Quantization-Aware Training (QAT) is one of the prevailing neural network compression solutions. However, its stability has been challenged for yielding deteriorating performances as the quantization error is inevitable. We find that the sharp landscape of loss, which leads to a dramatic performance drop, is an essential factor that causes instability. Theoretically, we have discovered that the perturbations in the feature would bring a flat local minima. However, simply adding perturbations into either weight or feature empirically deteriorates the performance of the Full Precision (FP) model. In this paper, we propose Feature-Perturbed Quantization (FPQ) to stochastically perturb the feature and employ the feature distillation method to the quantized model. Our method generalizes well to different network architectures and various QAT methods. Furthermore, we mathematically show that FPQ implicitly regularizes the Hessian norm, which calibrates the smoothness of a loss landscape. Extensive experiments demonstrate that our approach significantly outperforms the current State-Of-The-Art (SOTA) QAT methods and even the FP counterparts.

Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix

TL;DR

This work addresses the instability of Quantization-Aware Training (QAT) caused by unavoidable quantization errors that induce sharp loss landscapes. It proposes Feature-Perturbed Quantization (FPQ), which injects stochastic perturbations into layer inputs (via Stochastic Feature Perturbations) and employs Channel-wise Standardization Distillation to align feature distributions between quantized and full-precision models, thereby implicitly regularizing the Hessian and promoting flat minima. Theoretical and empirical analyses show that FPQ reduces the Hessian norm (e.g., ) and outperforms state-of-the-art QAT methods and FP baselines across CIFAR-10/100 on multiple architectures, with ablations confirming the additive benefits of perturbations and distillation. The results suggest FPQ offers a practical, architecture-agnostic improvement for stable, high-accuracy quantized models, advancing the deployment of efficient neural networks on edge devices.

Abstract

Quantization-Aware Training (QAT) is one of the prevailing neural network compression solutions. However, its stability has been challenged for yielding deteriorating performances as the quantization error is inevitable. We find that the sharp landscape of loss, which leads to a dramatic performance drop, is an essential factor that causes instability. Theoretically, we have discovered that the perturbations in the feature would bring a flat local minima. However, simply adding perturbations into either weight or feature empirically deteriorates the performance of the Full Precision (FP) model. In this paper, we propose Feature-Perturbed Quantization (FPQ) to stochastically perturb the feature and employ the feature distillation method to the quantized model. Our method generalizes well to different network architectures and various QAT methods. Furthermore, we mathematically show that FPQ implicitly regularizes the Hessian norm, which calibrates the smoothness of a loss landscape. Extensive experiments demonstrate that our approach significantly outperforms the current State-Of-The-Art (SOTA) QAT methods and even the FP counterparts.

Paper Structure

This paper contains 17 sections, 1 theorem, 21 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

(Inject noise into multiple layers would result in a biased expectation) Without loss of generality, given a DNN with $N (N\ge2)$ layers, if noises $\boldsymbol{\delta}^l (1\leq l \leq N)$ are simultaneously injected into different layers, the accumulated perturbation from different layers is biased

Figures (5)

  • Figure 1: The output variances of FP32, LSQ W4A4, LSQ W2A4, FPQ W4A4, FPQ W2A4 models of the ResNet18 on CIFAR-10 dataset.
  • Figure 2: A comparison of the norm $\|\nabla_{\boldsymbol{w}} L(\boldsymbol{w})\|$ trajectories of ResNet-18 on CIFIAR-10.
  • Figure 3: Comparisons of the output feature of the ResNet-18 before and after quantization. The orange line represents the mean value of the feature (best viewed in color).
  • Figure 4: Feature distributions of the FP ResNet-18 (blue lines), native baseline (LSQ, red lines), and ours method (green lines) for the same feature of ResNet-18 model on CIFAR-10. We random select three samples and plot the output feature of the same layer.
  • Figure 5: Trajectory of the Hessian matrix trace for ResNet-18 and MobileNetV2 models on the CIFAR-10 dataset.

Theorems & Definitions (2)

  • Proposition 1
  • proof