Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

Muqun Niu; Yuan Ren; Boyu Li; Chenchen Ding

Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

Muqun Niu, Yuan Ren, Boyu Li, Chenchen Ding

TL;DR

The paper tackles the quantization difficulty of structural re-parameterized networks, specifically RepVGG, caused by weight outliers arising during the training-to-inference merging process. It introduces Outlier-Aware Batch Normalization (OABN) to suppress these outliers during training and ClusterQAT, a clustering-based Quantization-Aware Training method, to preserve weight distributions under low-bit quantization. Experimental results on CIFAR-10 and ImageNet show that combining OABN with ClusterQAT significantly improves 8-bit and lower-bit performance, making RepVGG viable under tight memory and state constraints. The proposed approach offers a practical, training-time solution to enable edge-ready quantization of SR networks without substantial architecture changes.

Abstract

Lightweight design of Convolutional Neural Networks (CNNs) requires co-design efforts in the model architectures and compression techniques. As a novel design paradigm that separates training and inference, a structural re-parameterized (SR) network such as the representative RepVGG revitalizes the simple VGG-like network with a high accuracy comparable to advanced and often more complicated networks. However, the merging process in SR networks introduces outliers into weights, making their distribution distinct from conventional networks and thus heightening difficulties in quantization. To address this, we propose an operator-level improvement for training called Outlier Aware Batch Normalization (OABN). Additionally, to meet the demands of limited bitwidths while upkeeping the inference accuracy, we develop a clustering-based non-uniform quantization framework for Quantization-Aware Training (QAT) named ClusterQAT. Integrating OABN with ClusterQAT, the quantized performance of RepVGG is largely enhanced, particularly when the bitwidth falls below 8.

Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

TL;DR

Abstract

Paper Structure (17 sections, 4 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 17 sections, 4 equations, 8 figures, 7 tables, 2 algorithms.

INTRODUCTION
OABN - OPERATOR FOR IMPROVING REPVGG'S QUANTIZATION
REPVGG HAS OUTLIERS IN WEIGHTS
OABN - OUTLIER AWARE BATCH NORNALIZATION
CLUSTERQAT - LOWER-BIT QUANTIZATION FOR REPVGG
QUANTIZATION-AWARE TRAINING FOR REPVGG
CLUSTERQAT - QUANTIZATION-AWARE TRAINING WITH CLUSTERING
EXPERIMENTS AND DISCUSSIONS
EXPERIMENT SETTINGS
RESULTS WITH OABN
8-bit uniform quantization with OABN
8-bit PTQ results with OABN
RESULTS WITH OABN AND CLUSTERQAT
Lower-bit Quantization with OABN and ClusterQAT
COMPUTATION COST FOR TRAINING
...and 2 more sections

Figures (8)

Figure 1: Overview of outlier-aware training for low-bit quantization of structural reparameterized networks.
Figure 2: Visualization of weights in the 43th channel of the 5th layer for example. (a) RepVGG - hard to quantize. (b) ResNet - easy to quantize.
Figure 3: Merging steps in RepVGG. Each branch is converted into $3\times3$ kernel first and then added up to form the final $3\times3$ Conv kernel.
Figure 4: Findings about positions and magnitudes of the merged weights. (a) Magnitudes of merged weights fit the pattern in Figure \ref{['fig10-Merging']} about the diagonal elements in Class A (e.g. in layer 2). (b) Comparisons of channel-wise ratios $\frac{\gamma}{\sqrt{\sigma^{2}}}$$(\mathcal{W}^\mathrm{(id)}_\mathrm{Class\;A})$ and weights in Class A$(\mathcal{W}_\mathrm{Class\;A})$.
Figure 5: Uniform quantization schemes for weights and activations, respectively. (a) Weight quantizer (b) Activation quantizer.
...and 3 more figures

Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

TL;DR

Abstract

Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)