Table of Contents
Fetching ...

Towards Personalized Federated Learning via Comprehensive Knowledge Distillation

Pengju Wang, Bochao Liu, Weijia Guo, Yong Li, Shiming Ge

TL;DR

This paper presents a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models.

Abstract

Federated learning is a distributed machine learning paradigm designed to protect data privacy. However, data heterogeneity across various clients results in catastrophic forgetting, where the model rapidly forgets previous knowledge while acquiring new knowledge. To address this challenge, personalized federated learning has emerged to customize a personalized model for each client. However, the inherent limitation of this mechanism is its excessive focus on personalization, potentially hindering the generalization of those models. In this paper, we present a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation. The historical model represents the local model from the last round of client training, containing historical personalized knowledge, while the global model represents the aggregated model from the last round of server aggregation, containing global generalized knowledge. By applying knowledge distillation, we effectively transfer global generalized knowledge and historical personalized knowledge to the local model, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models. Extensive experimental results demonstrate the significant advantages of our method.

Towards Personalized Federated Learning via Comprehensive Knowledge Distillation

TL;DR

This paper presents a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models.

Abstract

Federated learning is a distributed machine learning paradigm designed to protect data privacy. However, data heterogeneity across various clients results in catastrophic forgetting, where the model rapidly forgets previous knowledge while acquiring new knowledge. To address this challenge, personalized federated learning has emerged to customize a personalized model for each client. However, the inherent limitation of this mechanism is its excessive focus on personalization, potentially hindering the generalization of those models. In this paper, we present a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation. The historical model represents the local model from the last round of client training, containing historical personalized knowledge, while the global model represents the aggregated model from the last round of server aggregation, containing global generalized knowledge. By applying knowledge distillation, we effectively transfer global generalized knowledge and historical personalized knowledge to the local model, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models. Extensive experimental results demonstrate the significant advantages of our method.

Paper Structure

This paper contains 11 sections, 4 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Data heterogeneity in FL leads to catastrophic forgetting.
  • Figure 2: The framework of our method. The client update phase is divided into local distillation and local store processes. During the local distillation process, a comprehensive knowledge distillation is performed, transferring knowledge from the global model $\bm{w}^{t-1}_g$ and the historical model $\bm{w}^{t-1}_h$ to the local model $\bm{w}^{t-1}_k$. During the local store process, the emphasis is on preserving the local model for the next round, $\bm{w}^{t}_h\leftarrow\bm{w}^{t}_k$. Different colors are used to differentiate between knowledge types, where blue represents generalized knowledge and other colors indicate personalized knowledge.
  • Figure 3: Data heterogeneity among $20$ clients on the CIFAR100 dataset.
  • Figure 4: Learning curves under different experimental settings.
  • Figure 5: Accuracy difference (%) among $20$ clients on the CIFAR100 dataset.