Tackling Noisy Clients in Federated Learning with End-to-end Label Correction
Xuefeng Jiang, Sheng Sun, Jia Li, Jingjing Xue, Runhan Li, Zhiyuan Wu, Gang Xu, Yuwei Wang, Min Liu
TL;DR
This paper addresses the problem of heterogeneous label noise in federated learning by introducing FedELC, a two-stage framework that first identifies noisy clients using a class-aware loss matrix and a two-component Gaussian Mixture Model, then performs end-to-end label correction on data from detected noisy clients by learning a differentiable pseudo-ground-truth distribution $y^d$ through backpropagation. The local objective combines cross-entropy with a compatibility term and an entropy term, $L = L_c + \alpha L_{comp} + \beta L_e$, and uses distance-aware aggregation to reduce the influence of noisy clients, quantified through $D(i)$. Extensive experiments across CIFAR-10/100, CIFAR-10-N/CIFAR-100-N, and Clothing1M demonstrate that FedELC consistently outperforms sixteen baselines under varied noise patterns, while also improving the data quality of noisy clients. The method offers a practical pathway to robust FL in real-world deployments with imperfect annotations, at the cost of a modest computational overhead, and provides open-source code for replication.
Abstract
Recently, federated learning (FL) has achieved wide successes for diverse privacy-sensitive applications without sacrificing the sensitive private information of clients. However, the data quality of client datasets can not be guaranteed since corresponding annotations of different clients often contain complex label noise of varying degrees, which inevitably causes the performance degradation. Intuitively, the performance degradation is dominated by clients with higher noise rates since their trained models contain more misinformation from data, thus it is necessary to devise an effective optimization scheme to mitigate the negative impacts of these noisy clients. In this work, we propose a two-stage framework FedELC to tackle this complicated label noise issue. The first stage aims to guide the detection of noisy clients with higher label noise, while the second stage aims to correct the labels of noisy clients' data via an end-to-end label correction framework which is achieved by learning possible ground-truth labels of noisy clients' datasets via back propagation. We implement sixteen related methods and evaluate five datasets with three types of complicated label noise scenarios for a comprehensive comparison. Extensive experimental results demonstrate our proposed framework achieves superior performance than its counterparts for different scenarios. Additionally, we effectively improve the data quality of detected noisy clients' local datasets with our label correction framework. The code is available at https://github.com/Sprinter1999/FedELC.
