Table of Contents
Fetching ...

Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning

Xiaoyu Gan, Jingbo Jiang, Jingyang Zhu, Xiaomeng Wang, Xizi Chen, Chi-Ying Tsui

TL;DR

The paper addresses a persistent inter-class accuracy discrepancy (ICD) in federated learning even when data are balanced globally and locally. It analyzes how inherent data properties create weak classes that are consistently confusable, and introduces Partial Knowledge Distillation (PKD), which trains class-group experts on subsets of weak classes and uses misclassification-triggered KL-divergence distillation to transfer knowledge to the global model. PKD achieves meaningful improvements in weak-class accuracy (e.g., up to $10.7\%$ on FashionMNIST and $6.3\%$ on CIFAR-10) and reduces ICD across multiple datasets, while maintaining or boosting overall accuracy; overhead from expert training and selective KD remains modest. This approach enhances FL robustness to intrinsic class confusion without resorting to data resampling or heavy architectural changes, making it practical for real-world, heterogeneous deployments.

Abstract

Substantial efforts have been devoted to alleviating the impact of the long-tailed class distribution in federated learning. In this work, we observe an interesting phenomenon that certain weak classes consistently exist even for class-balanced learning. These weak classes, different from the minority classes in the previous works, are inherent to data and remain fairly consistent for various network structures, learning paradigms, and data partitioning methods. The inherent inter-class accuracy discrepancy can reach over 36.9% for federated learning on the FashionMNIST and CIFAR-10 datasets, even when the class distribution is balanced both globally and locally. In this study, we empirically analyze the potential reason for this phenomenon. Furthermore, a partial knowledge distillation (PKD) method is proposed to improve the model's classification accuracy for weak classes. In this approach, knowledge transfer is initiated upon the occurrence of specific misclassifications within certain weak classes. Experimental results show that the accuracy of weak classes can be improved by 10.7%, reducing the inherent inter-class discrepancy effectively.

Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning

TL;DR

The paper addresses a persistent inter-class accuracy discrepancy (ICD) in federated learning even when data are balanced globally and locally. It analyzes how inherent data properties create weak classes that are consistently confusable, and introduces Partial Knowledge Distillation (PKD), which trains class-group experts on subsets of weak classes and uses misclassification-triggered KL-divergence distillation to transfer knowledge to the global model. PKD achieves meaningful improvements in weak-class accuracy (e.g., up to on FashionMNIST and on CIFAR-10) and reduces ICD across multiple datasets, while maintaining or boosting overall accuracy; overhead from expert training and selective KD remains modest. This approach enhances FL robustness to intrinsic class confusion without resorting to data resampling or heavy architectural changes, making it practical for real-world, heterogeneous deployments.

Abstract

Substantial efforts have been devoted to alleviating the impact of the long-tailed class distribution in federated learning. In this work, we observe an interesting phenomenon that certain weak classes consistently exist even for class-balanced learning. These weak classes, different from the minority classes in the previous works, are inherent to data and remain fairly consistent for various network structures, learning paradigms, and data partitioning methods. The inherent inter-class accuracy discrepancy can reach over 36.9% for federated learning on the FashionMNIST and CIFAR-10 datasets, even when the class distribution is balanced both globally and locally. In this study, we empirically analyze the potential reason for this phenomenon. Furthermore, a partial knowledge distillation (PKD) method is proposed to improve the model's classification accuracy for weak classes. In this approach, knowledge transfer is initiated upon the occurrence of specific misclassifications within certain weak classes. Experimental results show that the accuracy of weak classes can be improved by 10.7%, reducing the inherent inter-class discrepancy effectively.

Paper Structure

This paper contains 15 sections, 6 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: (a) The Conventional Long-tailed Problem; (b) Inherent Inter-class Discrepancies Observed under Balanced Class Distributions (both Globally and Locally).
  • Figure 2: (a) The Class-wise Accuracy (with Mean Subtraction) Based on Different Learning Paradigms and Network Structures ($\Delta_{max}$ Refers to the Maximum Discrepancy after a Training Session; the Network Structures, such as VGG-9 vggFedMA and ConvNets fedavg, are Detailed in Section \ref{['sec: 4.1']}.) (b) Sample Distribution in Different Scenarios (Assuming a Total of 10 Clients).
  • Figure 3: The Maximum, Average, and Minimum Class-wise Accuracy During Training.
  • Figure 4: FashionMNIST: (a) Raw Samples and the Similarities in High-Level Features between each Pair of Classes; (b) Output Probabilities (Averaged over 1000 Samples per Class); (c) Prediction Results. For Simplicity, each Class is Assigned a Serial Number. (0: T-shirt, 1: Trouser, 2: Pullover, 3: Dress, 4: Coat, 5: Sandal, 6: Shirt, 7: Sneaker, 8: Bag, 9: Ankle Boot.)
  • Figure 5: t-SNE Visualizations of Feature Vectors on (a) FashionMNIST and (b) CIFAR-10 (under Local Class-Balanced Scenario, 10 clients).
  • ...and 8 more figures