Table of Contents
Fetching ...

AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation

Zihao Tang, Zheqi Lv, Shengyu Zhang, Yifan Zhou, Xinyu Duan, Fei Wu, Kun Kuang

TL;DR

This work proposes a simple but effective method AuG-KD that utilizes an uncertainty-guided and sample-specific anchor to align student-domain data with the teacher domain and leverages a generative method to progressively trade off the learning process between OOD knowledge distillation and domain-specific information learning via mixup learning.

Abstract

Due to privacy or patent concerns, a growing number of large models are released without granting access to their training data, making transferring their knowledge inefficient and problematic. In response, Data-Free Knowledge Distillation (DFKD) methods have emerged as direct solutions. However, simply adopting models derived from DFKD for real-world applications suffers significant performance degradation, due to the discrepancy between teachers' training data and real-world scenarios (student domain). The degradation stems from the portions of teachers' knowledge that are not applicable to the student domain. They are specific to the teacher domain and would undermine students' performance. Hence, selectively transferring teachers' appropriate knowledge becomes the primary challenge in DFKD. In this work, we propose a simple but effective method AuG-KD. It utilizes an uncertainty-guided and sample-specific anchor to align student-domain data with the teacher domain and leverages a generative method to progressively trade off the learning process between OOD knowledge distillation and domain-specific information learning via mixup learning. Extensive experiments in 3 datasets and 8 settings demonstrate the stability and superiority of our approach. Code available at https://github.com/IshiKura-a/AuG-KD .

AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation

TL;DR

This work proposes a simple but effective method AuG-KD that utilizes an uncertainty-guided and sample-specific anchor to align student-domain data with the teacher domain and leverages a generative method to progressively trade off the learning process between OOD knowledge distillation and domain-specific information learning via mixup learning.

Abstract

Due to privacy or patent concerns, a growing number of large models are released without granting access to their training data, making transferring their knowledge inefficient and problematic. In response, Data-Free Knowledge Distillation (DFKD) methods have emerged as direct solutions. However, simply adopting models derived from DFKD for real-world applications suffers significant performance degradation, due to the discrepancy between teachers' training data and real-world scenarios (student domain). The degradation stems from the portions of teachers' knowledge that are not applicable to the student domain. They are specific to the teacher domain and would undermine students' performance. Hence, selectively transferring teachers' appropriate knowledge becomes the primary challenge in DFKD. In this work, we propose a simple but effective method AuG-KD. It utilizes an uncertainty-guided and sample-specific anchor to align student-domain data with the teacher domain and leverages a generative method to progressively trade off the learning process between OOD knowledge distillation and domain-specific information learning via mixup learning. Extensive experiments in 3 datasets and 8 settings demonstrate the stability and superiority of our approach. Code available at https://github.com/IshiKura-a/AuG-KD .
Paper Structure (22 sections, 9 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 9 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Differences between KD, DFKD, and OOD-KD problems.
  • Figure 2: Overview of our proposed method, consisting of three major modules.
  • Figure 3: Different mixup samples generated in Module 3 for DSLR in Office-31, controlled by the stage factor $f\in[0,1]$. The value of $f$ determines the proximity of the samples to $D_t$ and $D_s$. A smaller value of $f$ indicates that the samples are closer to the teacher domain $D_t$, while a larger value of $f$ indicates that the samples are closer to the student domain $D_s$.
  • Figure 4: Grid study on hyperparameter $a$ and $b$ in Module 3. The red line is $b=1.0$, meaning no mixup data. The blue line portrays the performance of various $a-b$ settings. The light blue area symbolizes the range encompassing mean $\pm$ std.
  • Figure 5: Visualization Results on $z$
  • ...and 7 more figures