ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

Divyang Doshi; Jung-Eun Kim

ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

Divyang Doshi, Jung-Eun Kim

TL;DR

This work tackles the high computational cost of knowledge distillation by eliminating the need for a large teacher model. It introduces ReffAKD, which uses a compact convolutional autoencoder to learn class-relevant embeddings, builds a cosine-based class similarity matrix with a diagonal boost, and derives soft labels AESL to supervise the student via a specialized loss $L_{\mathrm{ReffAKD}}$ that combines KLD with hard-label supervision. Across CIFAR-100, Tiny Imagenet, and Fashion MNIST, ReffAKD achieves competitive or superior accuracy while substantially reducing resource usage (FLOPs, MACs, parameters, and memory) compared to vanilla KD with a teacher. The approach is compatible with existing logit-based KD techniques, scalable to edge devices, and extensible to other domains such as NLP, making knowledge distillation more accessible and cost-effective for practical deployment.

Abstract

In this research, we propose an innovative method to boost Knowledge Distillation efficiency without the need for resource-heavy teacher models. Knowledge Distillation trains a smaller ``student'' model with guidance from a larger ``teacher'' model, which is computationally costly. However, the main benefit comes from the soft labels provided by the teacher, helping the student grasp nuanced class similarities. In our work, we propose an efficient method for generating these soft labels, thereby eliminating the need for a large teacher model. We employ a compact autoencoder to extract essential features and calculate similarity scores between different classes. Afterward, we apply the softmax function to these similarity scores to obtain a soft probability vector. This vector serves as valuable guidance during the training of the student model. Our extensive experiments on various datasets, including CIFAR-100, Tiny Imagenet, and Fashion MNIST, demonstrate the superior resource efficiency of our approach compared to traditional knowledge distillation methods that rely on large teacher models. Importantly, our approach consistently achieves similar or even superior performance in terms of model accuracy. We also perform a comparative study with various techniques recently developed for knowledge distillation showing our approach achieves competitive performance with using significantly less resources. We also show that our approach can be easily added to any logit based knowledge distillation method. This research contributes to making knowledge distillation more accessible and cost-effective for practical applications, making it a promising avenue for improving the efficiency of model training. The code for this work is available at, https://github.com/JEKimLab/ReffAKD.

ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

TL;DR

that combines KLD with hard-label supervision. Across CIFAR-100, Tiny Imagenet, and Fashion MNIST, ReffAKD achieves competitive or superior accuracy while substantially reducing resource usage (FLOPs, MACs, parameters, and memory) compared to vanilla KD with a teacher. The approach is compatible with existing logit-based KD techniques, scalable to edge devices, and extensible to other domains such as NLP, making knowledge distillation more accessible and cost-effective for practical deployment.

Abstract

Paper Structure (16 sections, 2 equations, 5 figures, 10 tables)

This paper contains 16 sections, 2 equations, 5 figures, 10 tables.

Introduction
Related Work
Methodology
Architecture
Soft Labels with Autoencoder
ReffAKD Loss Function
Evaluation
Experimental Setup
Accuracy on CIFAR-100
Accuracy on Tiny Imagenet
Accuracy on Fashion MNIST
Effects of Temperature and Alpha
Resource Consumption
Comparative analysis
Discussion and Future Work
...and 1 more sections

Figures (5)

Figure 1: Convolutional Autoencoder for CIFAR-100.
Figure 2: Vanilla knowledge distillation
Figure 3: ReffAKD distillation
Figure 4: Output probability distribution comparison for ResNet50 and ReffAKD for one instance of CIFAR-100 dataset
Figure 5: Accuracy and resource consumption comparisons of ReffAKD and vanilla KD teacher models for CIFAR-100 (first row), Tiny ImageNet (second row), and Fashion MNIST (third row). Note the Y-axes of resource consumption are in a log scale.

ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

TL;DR

Abstract

ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)