Table of Contents
Fetching ...

Federated Learning with Extremely Noisy Clients via Negative Distillation

Yang Lu, Lin Chen, Yonggang Zhang, Yiliang Zhang, Bo Han, Yiu-ming Cheung, Hanzi Wang

TL;DR

The paper tackles federated learning under extreme client label noise by identifying extremely noisy (EN) clients via MC dropout uncertainty and introducing negative distillation (FedNed). Instead of discarding EN clients, FedNed uses their models as bad teachers to encourage the global model to diverge from incorrect predictions, while MN clients contribute through standard aggregation and EN models contribute via a pseudo-labeled, distillation-informed update. A public dataset on the server enables EN identification and negative distillation, and locally, EN clients train both supervised and pseudo-labeled models to support robust aggregation. Extensive experiments on CIFAR-10/100 with varying non-IID and noise settings show FedNed consistently outperforms baselines and ablations confirm the effectiveness of EN identification, negative distillation, and pseudo-labeling. The approach advances practical FL in highly corrupted environments and demonstrates strong robustness to the number of EN clients and public-data choices.

Abstract

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., $>$90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a `bad teacher' in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance. Our code is available at https://github.com/linChen99/FedNed.

Federated Learning with Extremely Noisy Clients via Negative Distillation

TL;DR

The paper tackles federated learning under extreme client label noise by identifying extremely noisy (EN) clients via MC dropout uncertainty and introducing negative distillation (FedNed). Instead of discarding EN clients, FedNed uses their models as bad teachers to encourage the global model to diverge from incorrect predictions, while MN clients contribute through standard aggregation and EN models contribute via a pseudo-labeled, distillation-informed update. A public dataset on the server enables EN identification and negative distillation, and locally, EN clients train both supervised and pseudo-labeled models to support robust aggregation. Extensive experiments on CIFAR-10/100 with varying non-IID and noise settings show FedNed consistently outperforms baselines and ablations confirm the effectiveness of EN identification, negative distillation, and pseudo-labeling. The approach advances practical FL in highly corrupted environments and demonstrates strong robustness to the number of EN clients and public-data choices.

Abstract

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., 90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a `bad teacher' in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance. Our code is available at https://github.com/linChen99/FedNed.
Paper Structure (37 sections, 6 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 37 sections, 6 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: The test accuracy of the global model by controlling the weight of a single client model. We set ten client models including eight clean ones (with a noise ratio of 0%) and two extremely noisy ones (with a noise ratio of 99%). $K$ is the total number of clients, which is ten in this example.
  • Figure 2: The architecture overview of the proposed FedNed. In each round, the server identifies the mildly noisy (MN) and extremely noisy (EN) client models via MC dropout and prediction uncertainty. Negative distillation is then utilized to incorporate EN client models for a better global model.
  • Figure 3: Histogram of model prediction uncertainty for both MN and EN clients, where the uncertainty is accumulated over all training rounds.
  • Figure 4: Comparison in the feature spaces plotted with t-SNE. (a) FedNed without negative distillation on the server, (b) FedNed with negative distillation on the server.
  • Figure 5: Comparison of performance among methods as the number of extreme noise clients increases.
  • ...and 2 more figures