Towards Accurate Post-training Quantization for Reparameterized Models
Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, YangWei Ying, Hong Zhou
TL;DR
This paper tackles the challenge of post-training quantization (PTQ) for reparameterized networks, where activation outliers driven by BatchNorm interactions cause substantial accuracy loss. It introduces RepAPQ, a framework combining Quantization Protecting Reparameterization (QPRep), which adds a channel-wise affine layer after reparameterized convolutions to stabilize outliers, and Across-block Calibration (ABC), which uses Mean Absolute Error (MAE) to distill information across blocks and stages. The approach outperforms prior PTQ methods, achieving roughly 1% improvements at 8-bit and 2% at 6-bit across models like RepVGG and MobileOne, and maintains competitive accuracy on object detection with Yolov6. The results demonstrate that MAE-based distortion and cross-block feedback enable accurate low-bit quantization of reparameterized networks, with practical implications for deploying fast, memory-efficient models on hardware. The code accompanying RepAPQ is publicly available, signaling a scalable, model-agnostic path toward robust PTQ for modern CNN architectures.
Abstract
Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework comprises two main components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance. The code is available at \url{https://github.com/ilur98/DLMC-QUANT}.
