Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection
Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng
TL;DR
This work tackles the high training cost and vulnerability to label noise in quantization-aware training (QAT) by introducing two loss-gradient metrics, $d_{\text{EVS}}$ and $d_{\text{DS}}$, and an adaptive coreset selection strategy (ACS) that adaptively samples informative data across epochs. ACS combines the two metrics with a cosine-annealed weighting, yielding $d_{\text{ACS}}(t) = \beta(t) d_{\text{EVS}}(t) + (1-\beta(t)) d_{\text{DS}}(t)$ where $\beta(t) = \cos(\frac{t}{2E}\pi)$, and selects data every $R$ epochs to balance adaptation and diversity. The approach is validated on CNNs and RetinaNet across CIFAR-10/100, ImageNet-1K, and COCO, achieving notable gains such as 68.39% top-1 with 10% of ImageNet data for 4-bit ResNet-18 and improved robustness under label noise, while incurring minimal selection overhead. These results demonstrate that QAT can be made far more data-efficient and robust, with potential applicability to noisy-label scenarios and object-detection tasks via the EVS/DS-based ACS framework.
Abstract
Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39\% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10\% subset, which has an absolute gain of 4.24\% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.
