Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Xijie Huang; Zechun Liu; Shih-Yang Liu; Kwang-Ting Cheng

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

TL;DR

This work tackles the high training cost and vulnerability to label noise in quantization-aware training (QAT) by introducing two loss-gradient metrics, $d_{\text{EVS}}$ and $d_{\text{DS}}$, and an adaptive coreset selection strategy (ACS) that adaptively samples informative data across epochs. ACS combines the two metrics with a cosine-annealed weighting, yielding $d_{\text{ACS}}(t) = \beta(t) d_{\text{EVS}}(t) + (1-\beta(t)) d_{\text{DS}}(t)$ where $\beta(t) = \cos(\frac{t}{2E}\pi)$, and selects data every $R$ epochs to balance adaptation and diversity. The approach is validated on CNNs and RetinaNet across CIFAR-10/100, ImageNet-1K, and COCO, achieving notable gains such as 68.39% top-1 with 10% of ImageNet data for 4-bit ResNet-18 and improved robustness under label noise, while incurring minimal selection overhead. These results demonstrate that QAT can be made far more data-efficient and robust, with potential applicability to noisy-label scenarios and object-detection tasks via the EVS/DS-based ACS framework.

Abstract

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39\% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10\% subset, which has an absolute gain of 4.24\% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

TL;DR

This work tackles the high training cost and vulnerability to label noise in quantization-aware training (QAT) by introducing two loss-gradient metrics,

and

, and an adaptive coreset selection strategy (ACS) that adaptively samples informative data across epochs. ACS combines the two metrics with a cosine-annealed weighting, yielding

where

, and selects data every

epochs to balance adaptation and diversity. The approach is validated on CNNs and RetinaNet across CIFAR-10/100, ImageNet-1K, and COCO, achieving notable gains such as 68.39% top-1 with 10% of ImageNet data for 4-bit ResNet-18 and improved robustness under label noise, while incurring minimal selection overhead. These results demonstrate that QAT can be made far more data-efficient and robust, with potential applicability to noisy-label scenarios and object-detection tasks via the EVS/DS-based ACS framework.

Abstract

Paper Structure (36 sections, 16 equations, 5 figures, 13 tables, 1 algorithm)

This paper contains 36 sections, 16 equations, 5 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Quantization
Learning from Noisy Labels
Coreset Selection
Importance of Each Sample in QAT
Preliminaries of QAT
Error Vector Score
Definition 1 (Error Vector Score)
Disagreement Score and Knowledge Distillation
Definition 2 (Disagreement Score)
Adaptive Coreset Selection for QAT
Experiments
Datasets and networks
Baselines
...and 21 more sections

Figures (5)

Figure 1: Left: Data scaling curve for 4-bit quantized ResNet-18 on the ImageNet-1K dataset. Our ACS significantly reduces test error using the same training data fraction compared to baselines. Middle: Accuracy of 2/32-bit quantized ResNet-18 trained on CIFAR-10 with 10% random label noise. Our ACS is the only method to outperform the full data training performance by effectively removing noisy samples. Right: Test error for quantized ResNet-18 with different data fraction and the same GPU training hours. Under the same training budget, QAT with smaller coreset selected by ACS outperforms full data training.
Figure 2: An overview of the Adaptive Coreset Selection (ACS) for efficient and robust QAT.
Figure 3: Visualization of the loss landscape li2018visualizing of 2-bit quantized MobileNetV2 trained on the CIFAR-100 (a) full dataset, (b) 10% random subset, and (c) 10% ACS coreset.
Figure 4: (a) Distribution of disagreement scores $d_{\text{DS}}$ on MobileNetV2 for different epochs. (b) Distribution of disagreement scores $d_{\text{DS}}$ and error vector score $d_{\text{EVS}}$ on MobileNetV2 in the same epoch.
Figure 5: Training/Testing loss/accuracy comparison of 4/4-bit quantized ResNet-18 on 10% coreset of ImageNet-1K with different selection interval R.

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

TL;DR

Abstract

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)