NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Minh-Tuan Tran; Trung Le; Xuan-May Le; Mehrtash Harandi; Quan Hung Tran; Dinh Phung

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Quan Hung Tran, Dinh Phung

TL;DR

A novel Noisy Layer Generation method (NAYER) which re-locates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input and achieves speeds 5 to 15 times faster than previous approaches.

Abstract

Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in prolonging training times and low-quality outputs. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input. LTE is generated by using the language model once, and then it is stored in memory for all subsequent training processes. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches. The code is available at https://github.com/tmtuan1307/nayer.

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

TL;DR

Abstract

Paper Structure (28 sections, 5 equations, 6 figures, 17 tables, 1 algorithm)

This paper contains 28 sections, 5 equations, 6 figures, 17 tables, 1 algorithm.

Introduction
Related Work
Proposed Method
Problem Formulation
Label-Text Embedding as Generator's Input
Generating Diverse Samples with Noisy Layer
Generator and Student Updating
Experiments
Experimental Settings
Results and Analysis
Ablation Study
Further Analysis
Conclusion
Training Details
Teacher Model Training Details
...and 13 more sections

Figures (6)

Figure 1: Accuracy of student models and GPU hours of training time on CIFAR-100 dataset. All variants of our method NAYER not only attains the highest accuracies across but also accelerates the training process by 5 to 15 times compared to DeepInv adi.
Figure 2: Data Generation Strategies: (a) Classic method which optimizes random noise (z); (b) Using one noisy layer for generating one synthetic image from the label-text embedding (${\bm{e}}_{{\bm{y}}}$); (c) Using one noisy layer to generate multiple synthetic images.
Figure 3: (a) Random noise for data generation. (b) One-hot labels only uniformly distinguish labels, lacking inter-class relationships. In contrast, (c) LTE captures inter-class connections, bringing similar classes closer in the embedding space. This proximity enhances the similarity between the input and ground-truth sample distributions, thereby allowing the model to more easily mimic the ground-truth distribution and accelerating the learning process. (d) The averaging magnitude of weight used to learn LTE is much larger than those for random noise, highlighting the model's negative focus on label information while ignoring random noise.
Figure 4: General Architecture of Noisy Layer Generation for Data-free Knowledge Distillation: NAYER initially employs the text encoder to generate the LTEs, which are then stored in the memory pool for model training. In each training batch, the LTEs serve as input for the noisy layer $\mathcal{Z}$ and generator $\mathcal{G}$ to produce synthetic images. Finally, these images are used for the joint training of the generator, noisy layer, and student network using Eq. \ref{['eq:lz']} and Eq. \ref{['eq:ls_m']}.
Figure 5: t-SNE Visualization of Label-Text Embedding and Ground-Truth Dataset Distribution for Four Classes: Car, Cat, Dog, and Truck.
...and 1 more figures

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

TL;DR

Abstract

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)