Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
Amin Parchami-Araghi, Moritz Böhle, Sukrut Rao, Bernt Schiele
TL;DR
This work tackles the problem that standard KD may not faithfully transfer the teacher's reasoning. It introduces explanation-enhanced KD (e^2KD), which adds a loss term L_exp that enforces similarity between teacher and student explanations in addition to matching logits, resulting in a model-agnostic framework defined by $L_{KD} = \tau^2 D_{KL}(p_T(x;\tau) || p_S(x;\tau))$ and $L = L_{KD} + \lambda L_{exp}$, where $L_{exp} = 1 - sim(E(T, x, \hat{y}_T), E(S, x, \hat{y}_T))$. The approach yields consistent improvements in accuracy and teacher-student agreement, promotes being "right for the right reasons", and preserves or transfers interpretability across architectures and tasks, including ImageNet, Waterbirds, and VOC, even under limited data and with approximate explanations via frozen explanations. Overall, e^2KD provides a simple, effective, and robust enhancement to KD with practical impact for faithful model distillation and interpretability-focused applications.
Abstract
Knowledge Distillation (KD) has proven effective for compressing large teacher models into smaller student models. While it is well known that student models can achieve similar accuracies as the teachers, it has also been shown that they nonetheless often do not learn the same function. It is, however, often highly desirable that the student's and teacher's functions share similar properties such as basing the prediction on the same input features, as this ensures that students learn the 'right features' from the teachers. In this work, we explore whether this can be achieved by not only optimizing the classic KD loss but also the similarity of the explanations generated by the teacher and the student. Despite the idea being simple and intuitive, we find that our proposed 'explanation-enhanced' KD (e$^2$KD) (1) consistently provides large gains in terms of accuracy and student-teacher agreement, (2) ensures that the student learns from the teacher to be right for the right reasons and to give similar explanations, and (3) is robust with respect to the model architectures, the amount of training data, and even works with 'approximate', pre-computed explanations.
