Table of Contents
Fetching ...

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

TL;DR

This work tackles the high training cost and vulnerability to label noise in quantization-aware training (QAT) by introducing two loss-gradient metrics, $d_{\text{EVS}}$ and $d_{\text{DS}}$, and an adaptive coreset selection strategy (ACS) that adaptively samples informative data across epochs. ACS combines the two metrics with a cosine-annealed weighting, yielding $d_{\text{ACS}}(t) = \beta(t) d_{\text{EVS}}(t) + (1-\beta(t)) d_{\text{DS}}(t)$ where $\beta(t) = \cos(\frac{t}{2E}\pi)$, and selects data every $R$ epochs to balance adaptation and diversity. The approach is validated on CNNs and RetinaNet across CIFAR-10/100, ImageNet-1K, and COCO, achieving notable gains such as 68.39% top-1 with 10% of ImageNet data for 4-bit ResNet-18 and improved robustness under label noise, while incurring minimal selection overhead. These results demonstrate that QAT can be made far more data-efficient and robust, with potential applicability to noisy-label scenarios and object-detection tasks via the EVS/DS-based ACS framework.

Abstract

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39\% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10\% subset, which has an absolute gain of 4.24\% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

TL;DR

This work tackles the high training cost and vulnerability to label noise in quantization-aware training (QAT) by introducing two loss-gradient metrics, and , and an adaptive coreset selection strategy (ACS) that adaptively samples informative data across epochs. ACS combines the two metrics with a cosine-annealed weighting, yielding where , and selects data every epochs to balance adaptation and diversity. The approach is validated on CNNs and RetinaNet across CIFAR-10/100, ImageNet-1K, and COCO, achieving notable gains such as 68.39% top-1 with 10% of ImageNet data for 4-bit ResNet-18 and improved robustness under label noise, while incurring minimal selection overhead. These results demonstrate that QAT can be made far more data-efficient and robust, with potential applicability to noisy-label scenarios and object-detection tasks via the EVS/DS-based ACS framework.

Abstract

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39\% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10\% subset, which has an absolute gain of 4.24\% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.
Paper Structure (36 sections, 16 equations, 5 figures, 13 tables, 1 algorithm)

This paper contains 36 sections, 16 equations, 5 figures, 13 tables, 1 algorithm.

Figures (5)

  • Figure 1: Left: Data scaling curve for 4-bit quantized ResNet-18 on the ImageNet-1K dataset. Our ACS significantly reduces test error using the same training data fraction compared to baselines. Middle: Accuracy of 2/32-bit quantized ResNet-18 trained on CIFAR-10 with 10% random label noise. Our ACS is the only method to outperform the full data training performance by effectively removing noisy samples. Right: Test error for quantized ResNet-18 with different data fraction and the same GPU training hours. Under the same training budget, QAT with smaller coreset selected by ACS outperforms full data training.
  • Figure 2: An overview of the Adaptive Coreset Selection (ACS) for efficient and robust QAT.
  • Figure 3: Visualization of the loss landscape li2018visualizing of 2-bit quantized MobileNetV2 trained on the CIFAR-100 (a) full dataset, (b) 10% random subset, and (c) 10% ACS coreset.
  • Figure 4: (a) Distribution of disagreement scores $d_{\text{DS}}$ on MobileNetV2 for different epochs. (b) Distribution of disagreement scores $d_{\text{DS}}$ and error vector score $d_{\text{EVS}}$ on MobileNetV2 in the same epoch.
  • Figure 5: Training/Testing loss/accuracy comparison of 4/4-bit quantized ResNet-18 on 10% coreset of ImageNet-1K with different selection interval R.