Table of Contents
Fetching ...

FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated Learning

Changlin Song, Divya Saxena, Jiannong Cao, Yuqing Zhao

TL;DR

The paper tackles model forgetting and reduced generalization in non-IID federated learning caused by imbalanced local data. It introduces FedDistill, a framework that combines group distillation with a decomposition of the global model into a generalized feature extractor and classifier to guide local models without extra communication or privacy costs. A three-part group distillation loss (true-class, few-sample, rich-sample) targets underrepresented classes and improves robustness to data imbalance. Extensive experiments on MNIST, CIFAR10, and CIFAR100 show state-of-the-art accuracy, reduced forgetting, and faster convergence compared to strong FL baselines, highlighting FedDistill’s practical impact for robust, privacy-preserving distributed learning.

Abstract

Federated Learning (FL) is a novel approach that allows for collaborative machine learning while preserving data privacy by leveraging models trained on decentralized devices. However, FL faces challenges due to non-uniformly distributed (non-iid) data across clients, which impacts model performance and its generalization capabilities. To tackle the non-iid issue, recent efforts have utilized the global model as a teaching mechanism for local models. However, our pilot study shows that their effectiveness is constrained by imbalanced data distribution, which induces biases in local models and leads to a 'local forgetting' phenomenon, where the ability of models to generalize degrades over time, particularly for underrepresented classes. This paper introduces FedDistill, a framework enhancing the knowledge transfer from the global model to local models, focusing on the issue of imbalanced class distribution. Specifically, FedDistill employs group distillation, segmenting classes based on their frequency in local datasets to facilitate a focused distillation process to classes with fewer samples. Additionally, FedDistill dissects the global model into a feature extractor and a classifier. This separation empowers local models with more generalized data representation capabilities and ensures more accurate classification across all classes. FedDistill mitigates the adverse effects of data imbalance, ensuring that local models do not forget underrepresented classes but instead become more adept at recognizing and classifying them accurately. Our comprehensive experiments demonstrate FedDistill's effectiveness, surpassing existing baselines in accuracy and convergence speed across several benchmark datasets.

FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated Learning

TL;DR

The paper tackles model forgetting and reduced generalization in non-IID federated learning caused by imbalanced local data. It introduces FedDistill, a framework that combines group distillation with a decomposition of the global model into a generalized feature extractor and classifier to guide local models without extra communication or privacy costs. A three-part group distillation loss (true-class, few-sample, rich-sample) targets underrepresented classes and improves robustness to data imbalance. Extensive experiments on MNIST, CIFAR10, and CIFAR100 show state-of-the-art accuracy, reduced forgetting, and faster convergence compared to strong FL baselines, highlighting FedDistill’s practical impact for robust, privacy-preserving distributed learning.

Abstract

Federated Learning (FL) is a novel approach that allows for collaborative machine learning while preserving data privacy by leveraging models trained on decentralized devices. However, FL faces challenges due to non-uniformly distributed (non-iid) data across clients, which impacts model performance and its generalization capabilities. To tackle the non-iid issue, recent efforts have utilized the global model as a teaching mechanism for local models. However, our pilot study shows that their effectiveness is constrained by imbalanced data distribution, which induces biases in local models and leads to a 'local forgetting' phenomenon, where the ability of models to generalize degrades over time, particularly for underrepresented classes. This paper introduces FedDistill, a framework enhancing the knowledge transfer from the global model to local models, focusing on the issue of imbalanced class distribution. Specifically, FedDistill employs group distillation, segmenting classes based on their frequency in local datasets to facilitate a focused distillation process to classes with fewer samples. Additionally, FedDistill dissects the global model into a feature extractor and a classifier. This separation empowers local models with more generalized data representation capabilities and ensures more accurate classification across all classes. FedDistill mitigates the adverse effects of data imbalance, ensuring that local models do not forget underrepresented classes but instead become more adept at recognizing and classifying them accurately. Our comprehensive experiments demonstrate FedDistill's effectiveness, surpassing existing baselines in accuracy and convergence speed across several benchmark datasets.
Paper Structure (12 sections, 15 equations, 10 figures, 5 tables)

This paper contains 12 sections, 15 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Impact of imbalanced class distribution on a model's gradient updates. Gradient changes in a neural network across four scenarios (s1, s2, s3, s4) for ten classes (Class ID 0-9). The bars indicate gradient magnitudes, with colors representing different scenarios. Significant gradient increases for classes 3 and 9 in s2 and s4 point to shifts in learning focus due to class imbalance.
  • Figure 2: Before training (a), the model may not be well-informed and hence shows no strong bias towards any class. After training on an imbalanced dataset (b), the model exhibits a clear bias with significantly varied probabilities for each class. The elevated probability for certain classes suggests that the model has become more confident in these classes likely due to their overrepresentation in the training data. Conversely, the reduced probability for other classes indicates a loss of confidence, which can be interpreted as the model 'forgetting' or failing to recognize these underrepresented classes.
  • Figure 3: Illustration of softmax output distribution disparities between global and local models under imbalanced class distribution on client 0, highlighting the local model's inclination towards fitting its specific dataset and the global model's balanced approach.
  • Figure 4: Overview of the FedDistill framework, detailing the interplay between global and local models' feature extractors and classifiers for an input $x_i$. Specifically, $x_i$ is the input for the global feature extractor $E_g$ and local feature extractor $E_l$, respectively. After that, the features will be input into the classifiers $FC_g$ and $FC_l$ of both the global and the local model. $\hat{y}_{gg}$, $\hat{y}_{gl}$ denotes the output of global and local classifier with the global feature, respectively, and $\hat{y}_{lg}$, $\hat{y}_{ll}$ denotes the output of global and local classifier with the local feature, respectively. $y_i$ is the real output.
  • Figure 5: Illustration of the traditional KD and our G-KD. We reformulate the traditional KD into three parts: (1) The target class KL loss (TC-KD), which has been discussed in DKDref20. (2) The rich sample KL loss (RC-KD), denotes the KL loss for the rich sample classes that were highly represented by the local modal. (3) The few sample KL loss (FC-KD), denotes the KL loss for the few sample classes that were underrepresented by the local model. By separating the classes, we intended to accommodate the imbalance in the local dataset distribution by adjusting the weight ($\alpha_t, \alpha_r, \alpha_f$) correspondingly.
  • ...and 5 more figures