FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated Learning
Changlin Song, Divya Saxena, Jiannong Cao, Yuqing Zhao
TL;DR
The paper tackles model forgetting and reduced generalization in non-IID federated learning caused by imbalanced local data. It introduces FedDistill, a framework that combines group distillation with a decomposition of the global model into a generalized feature extractor and classifier to guide local models without extra communication or privacy costs. A three-part group distillation loss (true-class, few-sample, rich-sample) targets underrepresented classes and improves robustness to data imbalance. Extensive experiments on MNIST, CIFAR10, and CIFAR100 show state-of-the-art accuracy, reduced forgetting, and faster convergence compared to strong FL baselines, highlighting FedDistill’s practical impact for robust, privacy-preserving distributed learning.
Abstract
Federated Learning (FL) is a novel approach that allows for collaborative machine learning while preserving data privacy by leveraging models trained on decentralized devices. However, FL faces challenges due to non-uniformly distributed (non-iid) data across clients, which impacts model performance and its generalization capabilities. To tackle the non-iid issue, recent efforts have utilized the global model as a teaching mechanism for local models. However, our pilot study shows that their effectiveness is constrained by imbalanced data distribution, which induces biases in local models and leads to a 'local forgetting' phenomenon, where the ability of models to generalize degrades over time, particularly for underrepresented classes. This paper introduces FedDistill, a framework enhancing the knowledge transfer from the global model to local models, focusing on the issue of imbalanced class distribution. Specifically, FedDistill employs group distillation, segmenting classes based on their frequency in local datasets to facilitate a focused distillation process to classes with fewer samples. Additionally, FedDistill dissects the global model into a feature extractor and a classifier. This separation empowers local models with more generalized data representation capabilities and ensures more accurate classification across all classes. FedDistill mitigates the adverse effects of data imbalance, ensuring that local models do not forget underrepresented classes but instead become more adept at recognizing and classifying them accurately. Our comprehensive experiments demonstrate FedDistill's effectiveness, surpassing existing baselines in accuracy and convergence speed across several benchmark datasets.
