Table of Contents
Fetching ...

Federated Learning with Label-Masking Distillation

Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenxing Qian, Shiming Ge

TL;DR

This paper focuses on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different, and proposes a label-masking distillation approach termed FedLMD to facilitate Federated learning via perceiving the various label distributions of each client.

Abstract

Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different. When faced with such cases, most existing methods will lead to a suboptimal optimization due to the inadequate utilization of label distribution information in clients. Inspired by this, we propose a label-masking distillation approach termed FedLMD to facilitate federated learning via perceiving the various label distributions of each client. We classify the labels into majority and minority labels based on the number of examples per class during training. The client model learns the knowledge of majority labels from local data. The process of distillation masks out the predictions of majority labels from the global model, so that it can focus more on preserving the minority label knowledge of the client. A series of experiments show that the proposed approach can achieve state-of-the-art performance in various cases. Moreover, considering the limited resources of the clients, we propose a variant FedLMD-Tf that does not require an additional teacher, which outperforms previous lightweight approaches without increasing computational costs. Our code is available at https://github.com/wnma3mz/FedLMD.

Federated Learning with Label-Masking Distillation

TL;DR

This paper focuses on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different, and proposes a label-masking distillation approach termed FedLMD to facilitate Federated learning via perceiving the various label distributions of each client.

Abstract

Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different. When faced with such cases, most existing methods will lead to a suboptimal optimization due to the inadequate utilization of label distribution information in clients. Inspired by this, we propose a label-masking distillation approach termed FedLMD to facilitate federated learning via perceiving the various label distributions of each client. We classify the labels into majority and minority labels based on the number of examples per class during training. The client model learns the knowledge of majority labels from local data. The process of distillation masks out the predictions of majority labels from the global model, so that it can focus more on preserving the minority label knowledge of the client. A series of experiments show that the proposed approach can achieve state-of-the-art performance in various cases. Moreover, considering the limited resources of the clients, we propose a variant FedLMD-Tf that does not require an additional teacher, which outperforms previous lightweight approaches without increasing computational costs. Our code is available at https://github.com/wnma3mz/FedLMD.
Paper Structure (14 sections, 8 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 14 sections, 8 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: The model trained on the private dataset of a client with partial class labels $\mathcal{Y}^{-}$ is generally biased to $\mathcal{Y}^{-}$ due to knowledge missing over complete class labels $\mathcal{Y}$. Our FedLMD method proposes to alleviate it by utilizing the global model from the server to retain the knowledge of minority labels$\mathcal{Y}\backslash\mathcal{Y}^{-}$.
  • Figure 2: The label distribution of the training examples (Top), the prediction distribution of the FedAvg (Middle), and the prediction distribution of the FedLMD (Bottom) under different communication rounds.
  • Figure 3: The framework of our approach. For the aggregation process, for the uploaded weight $w_1, ... ,w_K$ of the model are calculated as weighted averages to obtain $w_g$. For each client, the training loss is the combination of the cross-entropy loss $\mathcal{L}_{\rm CE}$ for learning from local data and the label-masking distillation loss $\mathcal{L}_{\rm LMD}$ for distilling from the global model.
  • Figure 4: The effect of knowledge distillation in FedAvg on CIFAR-10 ($\alpha=0.05$).
  • Figure 5: Comparison of the accuracy (%) of the method without additional computational cost on two partition strategies Sharding (Left) and LDA (Right) of CIFAR-10.
  • ...and 4 more figures