FedNoisy: Federated Noisy Label Learning Benchmark
Siqi Liang, Jintao Huang, Junyuan Hong, Dun Zeng, Jiayu Zhou, Zenglin Xu
TL;DR
FedNoisy addresses the lack of standardized benchmarks for Federated Noisy Label Learning by introducing a comprehensive benchmark with 20 federated scenes over 6 datasets and 9 baselines. It (i) defines three noise scenes (globalized, localized, real-world) and three non-IID partition schemes to enable systematic FNLL evaluation, (ii) provides built-in datasets including MNIST, SVHN, CIFAR-10, CIFAR-100, Clothing1M, and WebVision, and (iii) demonstrates key findings, such as the strong performance of SCE and GCE in balancing accuracy and computation and the nuanced interactions between symmetric vs asymmetric noise and data heterogeneity. The results reveal that label-noise effects in FL are highly sensitive to partitioning schemes and noise patterns, underscoring the need for noise-adaptive, per-client strategies and robust aggregation. Code and datasets are publicly available to accelerate reproducible FNLL research.
Abstract
Federated learning has gained popularity for distributed learning without aggregating sensitive data from clients. But meanwhile, the distributed and isolated nature of data isolation may be complicated by data quality, making it more vulnerable to noisy labels. Many efforts exist to defend against the negative impacts of noisy labels in centralized or federated settings. However, there is a lack of a benchmark that comprehensively considers the impact of noisy labels in a wide variety of typical FL settings. In this work, we serve the first standardized benchmark that can help researchers fully explore potential federated noisy settings. Also, we conduct comprehensive experiments to explore the characteristics of these data settings and the comparison across baselines, which may guide method development in the future. We highlight the 20 basic settings for 6 datasets proposed in our benchmark and standardized simulation pipeline for federated noisy label learning, including implementations of 9 baselines. We hope this benchmark can facilitate idea verification in federated learning with noisy labels. \texttt{FedNoisy} is available at \codeword{https://github.com/SMILELab-FL/FedNoisy}.
