In-Distribution Consistency Regularization Improves the Generalization of Quantization-Aware Training
Junbiao Pang, Tianyang Cai, Baochang Zhang, Jiaqi Wu
TL;DR
This work tackles the generalization gap in Quantization-Aware Training (QAT) by introducing Consistency Regularization (CR), which enforces stable predictions for two augmented views of the same input through a teacher-student framework with Exponential Moving Average (EMA) updates. By incorporating in-distribution unlabeled data and a KL-based consistency loss between the student and teacher, CR promotes a flatter loss landscape and reduced sensitivity to input and weight perturbations, theoretically linking consistency to reduced sharpness. Empirically, CR delivers state-of-the-art improvements across CIFAR-10/100 and ImageNet, often surpassing or closely matching FP32 performance on several architectures, and shows strong gains with unlabeled data and carefully scheduled CR strength. The method is simple to adopt, adaptable to various QAT pipelines, and holds practical impact for deploying efficient low-bit models on edge devices while leveraging unlabeled data for better generalization.
Abstract
Although existing Quantization-Aware Training (QAT) methods intensively depend on knowledge distillation to guarantee performance, QAT still suffers from severe performance drop. The experiments have shown that vanilla quantization is sensitive to the perturbation from both the input and weights. Therefore, we assume that the generalization ability of QAT is predominantly caused by both the intrinsic instability (training time) and the limited generalization ability (testing time). In this paper, we address both issues from a new perspective by leveraging Consistency Regularization (CR) to improve the generalization ability of QAT. Empirical results and theoretical analysis verify that CR would bring a good generalization ability to different network architectures and various QAT methods. Extensive experiments demonstrate that our approach significantly outperforms current state-of-the-art QAT methods and even the FP counterparts. On CIFAR-10, the proposed method improves by 3.79% compared to the baseline method using ResNet18, and improves by 3.84% compared to the baseline method using the lightweight model MobileNet.
