Dynamic Temperature Knowledge Distillation
Yukang Wei, Yu Bai
TL;DR
DTKD introduces per-sample dynamic temperatures in knowledge distillation by minimizing the sharpness difference between teacher and student logits, using sharpness defined as $\text{sharpness}(\mathbf{z}) = \log\sum_i e^{z_i}$. The method computes temperatures $T_{tea}$ and $T_{stu}$ from per-sample logit magnitudes to align output distributions and obtains the total loss $\mathcal{L}_{KD}=α\mathcal{L}_{DTKD}+β\mathcal{L}_{KL}+γ\mathcal{L}_{CE}$. Experiments on CIFAR-100 and ImageNet show competitive accuracy with added robustness to Target Class KD and None-target Class KD settings, while incurring minimal additional training cost. DTKD is simple to implement and compatible with existing KD variants such as DKD, offering a practical boost to knowledge transfer in varied teacher-student configurations.
Abstract
Temperature plays a pivotal role in moderating label softness in the realm of knowledge distillation (KD). Traditional approaches often employ a static temperature throughout the KD process, which fails to address the nuanced complexities of samples with varying levels of difficulty and overlooks the distinct capabilities of different teacher-student pairings. This leads to a less-than-ideal transfer of knowledge. To improve the process of knowledge propagation, we proposed Dynamic Temperature Knowledge Distillation (DTKD) which introduces a dynamic, cooperative temperature control for both teacher and student models simultaneously within each training iterafion. In particular, we proposed "\textbf{sharpness}" as a metric to quantify the smoothness of a model's output distribution. By minimizing the sharpness difference between the teacher and the student, we can derive sample-specific temperatures for them respectively. Extensive experiments on CIFAR-100 and ImageNet-2012 demonstrate that DTKD performs comparably to leading KD techniques, with added robustness in Target Class KD and None-target Class KD scenarios.The code is available at https://github.com/JinYu1998/DTKD.
