Table of Contents
Fetching ...

What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias

Aida Mohammadshahi, Yani Ioannou

TL;DR

This paper investigates how knowledge distillation (KD) affects class-wise bias and fairness in deep neural networks, revealing that on balanced datasets a substantial fraction of classes can exhibit significant accuracy changes after distillation. It introduces a structured methodology to measure class-level bias and employ fairness metrics (Demographic Parity and Equalized Odds) as well as an individual fairness criterion, showing that distillation temperature $T$ strongly modulates these effects. Across image and language datasets, higher temperatures can improve the distilled student’s fairness and even surpass the teacher on some fairness metrics, though extremely high temperatures may reduce information conveyed by the teacher and degrade both accuracy and fairness. The work highlights non-uniform, dataset-dependent shifts in bias under KD and argues for cautious deployment in sensitive domains, balancing accuracy with fairness objectives and encouraging further research into temperature-driven fairness dynamics.

Abstract

Knowledge Distillation is a commonly used Deep Neural Network (DNN) compression method, which often maintains overall generalization performance. However, we show that even for balanced image classification datasets, such as CIFAR-100, Tiny ImageNet and ImageNet, as many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy (i.e. class bias) between a teacher/distilled student or distilled student/non-distilled student model. Changes in class bias are not necessarily an undesirable outcome when considered outside of the context of a model's usage. Using two common fairness metrics, Demographic Parity Difference (DPD) and Equalized Odds Difference (EOD) on models trained with the CelebA, Trifeature, and HateXplain datasets, our results suggest that increasing the distillation temperature improves the distilled student model's fairness, and the distilled student fairness can even surpass the fairness of the teacher model at high temperatures. Additionally, we examine individual fairness, ensuring similar instances receive similar predictions. Our results confirm that higher temperatures also improve the distilled student model's individual fairness. This study highlights the uneven effects of distillation on certain classes and its potentially significant role in fairness, emphasizing that caution is warranted when using distilled models for sensitive application domains.

What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias

TL;DR

This paper investigates how knowledge distillation (KD) affects class-wise bias and fairness in deep neural networks, revealing that on balanced datasets a substantial fraction of classes can exhibit significant accuracy changes after distillation. It introduces a structured methodology to measure class-level bias and employ fairness metrics (Demographic Parity and Equalized Odds) as well as an individual fairness criterion, showing that distillation temperature strongly modulates these effects. Across image and language datasets, higher temperatures can improve the distilled student’s fairness and even surpass the teacher on some fairness metrics, though extremely high temperatures may reduce information conveyed by the teacher and degrade both accuracy and fairness. The work highlights non-uniform, dataset-dependent shifts in bias under KD and argues for cautious deployment in sensitive domains, balancing accuracy with fairness objectives and encouraging further research into temperature-driven fairness dynamics.

Abstract

Knowledge Distillation is a commonly used Deep Neural Network (DNN) compression method, which often maintains overall generalization performance. However, we show that even for balanced image classification datasets, such as CIFAR-100, Tiny ImageNet and ImageNet, as many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy (i.e. class bias) between a teacher/distilled student or distilled student/non-distilled student model. Changes in class bias are not necessarily an undesirable outcome when considered outside of the context of a model's usage. Using two common fairness metrics, Demographic Parity Difference (DPD) and Equalized Odds Difference (EOD) on models trained with the CelebA, Trifeature, and HateXplain datasets, our results suggest that increasing the distillation temperature improves the distilled student model's fairness, and the distilled student fairness can even surpass the fairness of the teacher model at high temperatures. Additionally, we examine individual fairness, ensuring similar instances receive similar predictions. Our results confirm that higher temperatures also improve the distilled student model's individual fairness. This study highlights the uneven effects of distillation on certain classes and its potentially significant role in fairness, emphasizing that caution is warranted when using distilled models for sensitive application domains.

Paper Structure

This paper contains 25 sections, 8 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: Class-wise Bias and Distillation. Test Accuracies of ResNet-20 student models distilled from a ResNet-56 teacher on CIFAR-10 (left) and SVHN (right) over a range of temperatures $T$. Mean test accuracies are shown over five random initializations. Classes with statistically significant relative changes between the non-distilled student and the distilled student are noted with $\times$.
  • Figure 2: Class-wise Disagreement. Disagreement between a ResNet-56 teacher and ResNet-20 (left) non-distilled/(right) distilled student for (a) CIFAR-10 using $T=9$ and (b) SVHN using $T=7$. The diagonals are excluded since here both models predict the same class without any disagreement.
  • Figure 3: Temperature vs. Test Accuracy/Class Bias. Number of non-distilled vs. distilled student significantly affected classes (S.S.C.) and the number of teacher vs. distilled student significantly affected classes (T.S.C.) by in (a) CIFAR-100 (ResNet-56/ResNet-20) and (b) ImageNet datasets (ResNet-50/ResNet-18), with 100 and 1000 total classes respectively. As the temperature used for distillation increases up to T=10, the S.S.C. rises for both datasets. For ImageNet, T.S.C. decreases, while for CIFAR-100, it first decreases and then slightly increases. The changes in the distilled student's test accuracy over all classes are also depicted in the figure.
  • Figure 4: Evaluation of Fairness Metrics for Distilled Students in cv .eod and dpd are reported in % and lower values indicate improved fairness. (a) illustrates fairness metrics for the CelebA dataset with 'smiling' label concerning the ’Young’ demographic attribute and (b) concerning the ’Male’ demographic attribute. (c) presents fairness metrics for the Trifeature dataset with 'shape' label with regard to the ’color’ attribute and (d) with regard to the ’texture’ attribute. It is notable that the models are fairer for the Trifeature dataset compared to the CelebA dataset with lower values in metrics. The explanation lies in the fact that the Trifeature dataset maintains a balanced distribution of demographic attributes, while the CelebA dataset contains biases that mirror real-world disparities. As seen in the second column, the downward trend does not continue at very high temperatures (T$=20, 30, 40$), as the teacher model generates nearly uniform softmax outputs.
  • Figure 5: Evaluation of Fairness Metrics for Distilled Students in nlp.eod and dpd are reported in % and lower values indicate improved fairness. (a) illustrates fairness metrics for the HateXplain dataset concerning the 'gender' demographic attribute, and (b) with regard to the 'race' attribute. The teacher employed the BERT architecture, while the student used the DistilBERT architecture.
  • ...and 3 more figures