Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing

Inpyo Hong; Youngwan Jo; Hyojeong Lee; Sunghyun Ahn; Sanghyun Park

Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing

Inpyo Hong, Youngwan Jo, Hyojeong Lee, Sunghyun Ahn, Sanghyun Park

TL;DR

Advanced Knowledge Transfer introduces AKT to address zero-shot quantization in edge computing by refining feature distillation to preserve both spatial and channel information from a full-precision teacher. The method combines dual-information decomposition, refined feature distillation, and an additive knowledge-transfer objective $L_{AKT} = \alpha L_{RFD} + (1-\alpha)L_{KL}$ (with $\alpha$ typically 0.5) to transfer both internal representations and outputs. Empirical results on CIFAR-10/100 show substantial gains, especially at 3-bit and 5-bit, and achieve state-of-the-art performance on CIFAR-10 in these low-bit regimes when paired with contemporary ZSQ baselines like AdaDFQ; ablations confirm the benefit of jointly leveraging spatial and channel attention. The approach is data-agnostic in its training objective and is applicable across multiple data-generation strategies, suggesting meaningful impact for efficient inference on edge devices and guiding future work at the intersection of generative methods and learning-based quantization.

Abstract

We introduce AKT (Advanced Knowledge Transfer), a novel method to enhance the training ability of low-bit quantized (Q) models in the field of zero-shot quantization (ZSQ). Existing research in ZSQ has focused on generating high-quality data from full-precision (FP) models. However, these approaches struggle with reduced learning ability in low-bit quantization due to its limited information capacity. To overcome this limitation, we propose effective training strategy compared to data generation. Particularly, we analyzed that refining feature maps in the feature distillation process is an effective way to transfer knowledge to the Q model. Based on this analysis, AKT efficiently transfer core information from the FP model to the Q model. AKT is the first approach to utilize both spatial and channel attention information in feature distillation in ZSQ. Our method addresses the fundamental gradient exploding problem in low-bit Q models. Experiments on CIFAR-10 and CIFAR-100 datasets demonstrated the effectiveness of the AKT. Our method led to significant performance enhancement in existing generative models. Notably, AKT achieved significant accuracy improvements in low-bit Q models, achieving state-of-the-art in the 3,5bit scenarios on CIFAR-10. The code is available at https://github.com/Inpyo-Hong/AKT-Advanced-knowledge-Transfer.

Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing

TL;DR

(with

typically 0.5) to transfer both internal representations and outputs. Empirical results on CIFAR-10/100 show substantial gains, especially at 3-bit and 5-bit, and achieve state-of-the-art performance on CIFAR-10 in these low-bit regimes when paired with contemporary ZSQ baselines like AdaDFQ; ablations confirm the benefit of jointly leveraging spatial and channel attention. The approach is data-agnostic in its training objective and is applicable across multiple data-generation strategies, suggesting meaningful impact for efficient inference on edge devices and guiding future work at the intersection of generative methods and learning-based quantization.

Abstract

Paper Structure (18 sections, 9 equations, 3 figures, 4 tables)

This paper contains 18 sections, 9 equations, 3 figures, 4 tables.

Introduction
Related Work
Quantization
Zero-shot Quantization
Data Generation Methodology
Training Methodology
Preliminaries
Knowledge Distillation
AKT Method
Dual-Information Decomposition
Refined Feature Distillation
Advanced Knowledge Transfer
Experiments
Experimental Environments
Experimental Results
...and 3 more sections

Figures (3)

Figure 1: An overview of AKT(Advanced Knowledge Transfer) method. Step 1 illustrates the process of decomposing each feature map into spatial and channel information. Step 2 demonstrates the computation of the refined 'RFD loss' through the integration of spatial and channel losses. Step 3 presents the process of transferring the enhanced feature knowledge into the quantized model using 'RFD loss' in a zero-shot quantization setting.
Figure 2: An illustration of Refined Feature Distillation. The distillation process is independently applied to each of the $n$ layers, and the resulting losses are averaged to compute the final $L_{RFD}$loss.
Figure 3: Curvature Based on Hessian Trace in 3,4 bit Quantization.

Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing

TL;DR

Abstract

Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (3)