Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning
Xiaoyu Gan, Jingbo Jiang, Jingyang Zhu, Xiaomeng Wang, Xizi Chen, Chi-Ying Tsui
TL;DR
The paper addresses a persistent inter-class accuracy discrepancy (ICD) in federated learning even when data are balanced globally and locally. It analyzes how inherent data properties create weak classes that are consistently confusable, and introduces Partial Knowledge Distillation (PKD), which trains class-group experts on subsets of weak classes and uses misclassification-triggered KL-divergence distillation to transfer knowledge to the global model. PKD achieves meaningful improvements in weak-class accuracy (e.g., up to $10.7\%$ on FashionMNIST and $6.3\%$ on CIFAR-10) and reduces ICD across multiple datasets, while maintaining or boosting overall accuracy; overhead from expert training and selective KD remains modest. This approach enhances FL robustness to intrinsic class confusion without resorting to data resampling or heavy architectural changes, making it practical for real-world, heterogeneous deployments.
Abstract
Substantial efforts have been devoted to alleviating the impact of the long-tailed class distribution in federated learning. In this work, we observe an interesting phenomenon that certain weak classes consistently exist even for class-balanced learning. These weak classes, different from the minority classes in the previous works, are inherent to data and remain fairly consistent for various network structures, learning paradigms, and data partitioning methods. The inherent inter-class accuracy discrepancy can reach over 36.9% for federated learning on the FashionMNIST and CIFAR-10 datasets, even when the class distribution is balanced both globally and locally. In this study, we empirically analyze the potential reason for this phenomenon. Furthermore, a partial knowledge distillation (PKD) method is proposed to improve the model's classification accuracy for weak classes. In this approach, knowledge transfer is initiated upon the occurrence of specific misclassifications within certain weak classes. Experimental results show that the accuracy of weak classes can be improved by 10.7%, reducing the inherent inter-class discrepancy effectively.
