FedNoisy: Federated Noisy Label Learning Benchmark

Siqi Liang; Jintao Huang; Junyuan Hong; Dun Zeng; Jiayu Zhou; Zenglin Xu

FedNoisy: Federated Noisy Label Learning Benchmark

Siqi Liang, Jintao Huang, Junyuan Hong, Dun Zeng, Jiayu Zhou, Zenglin Xu

TL;DR

FedNoisy addresses the lack of standardized benchmarks for Federated Noisy Label Learning by introducing a comprehensive benchmark with 20 federated scenes over 6 datasets and 9 baselines. It (i) defines three noise scenes (globalized, localized, real-world) and three non-IID partition schemes to enable systematic FNLL evaluation, (ii) provides built-in datasets including MNIST, SVHN, CIFAR-10, CIFAR-100, Clothing1M, and WebVision, and (iii) demonstrates key findings, such as the strong performance of SCE and GCE in balancing accuracy and computation and the nuanced interactions between symmetric vs asymmetric noise and data heterogeneity. The results reveal that label-noise effects in FL are highly sensitive to partitioning schemes and noise patterns, underscoring the need for noise-adaptive, per-client strategies and robust aggregation. Code and datasets are publicly available to accelerate reproducible FNLL research.

Abstract

Federated learning has gained popularity for distributed learning without aggregating sensitive data from clients. But meanwhile, the distributed and isolated nature of data isolation may be complicated by data quality, making it more vulnerable to noisy labels. Many efforts exist to defend against the negative impacts of noisy labels in centralized or federated settings. However, there is a lack of a benchmark that comprehensively considers the impact of noisy labels in a wide variety of typical FL settings. In this work, we serve the first standardized benchmark that can help researchers fully explore potential federated noisy settings. Also, we conduct comprehensive experiments to explore the characteristics of these data settings and the comparison across baselines, which may guide method development in the future. We highlight the 20 basic settings for 6 datasets proposed in our benchmark and standardized simulation pipeline for federated noisy label learning, including implementations of 9 baselines. We hope this benchmark can facilitate idea verification in federated learning with noisy labels. \texttt{FedNoisy} is available at \codeword{https://github.com/SMILELab-FL/FedNoisy}.

FedNoisy: Federated Noisy Label Learning Benchmark

TL;DR

Abstract

Paper Structure (28 sections, 4 equations, 11 figures, 12 tables)

This paper contains 28 sections, 4 equations, 11 figures, 12 tables.

Introduction
Related Works
Problem Formulation
Benchmark Design
Label-Noise Simulations for Heterogeneous Clients
Datasets and Data Heterogeneity Simulation
Implemented methods
Experiments
Benchmark performance
Basic observations.
Baseline Performance
Generalizability over different model architectures.
Baseline robustness across datasets and noise settings.
Accuracy and computation trade-off.
Fine-grained Ablation Studies of FNLL
...and 13 more sections

Figures (11)

Figure 1: Federated noise scenes provided in FedNoisy. Left: globalized noise; middle: localized noise; right: real-word noise. Globalized noise follows a constant noise ratio $\varepsilon_{global}$ in label corruption, while localized noise draws localized noise ratio $\varepsilon_{k}$ from a uniform distribution on each client $k$.
Figure 2: The trade-off between accuracy and per-round local computation time on CIFAR-10 on 10 clients under noniid-#label=3 partition with symmetric noise. From left to right: globalized noise, localized noise. Baselines with * are using 8-layer CNN following DBLP:conf/icml/MaH00E020, otherwise with default VGG16.
Figure 3: The trade-off between accuracy and per-round local computation time on Clothing1M on 10 clients under noniid-#label=5 partition with real noise. Red and green dash boxes are used to categorize the baselines into several tiers for both test accuracy and per-round time, respectively. T1 denotes tier 1, and T2 denotes tier 2.
Figure 4: Gradient norm of global model for clean and globalized noise $\varepsilon_{global}=0.4$ settings.
Figure 5: Accuracy for different noise ratios on CIFAR-10 with 10 clients. The $x$-axis noise ratio is $\varepsilon_{global}$ for globalized noise, and $\varepsilon_{local}$ for localized noise $\mathcal{U}(\varepsilon_{local}-0.1, \varepsilon_{local}+0.1)$. From left to right: globalized noise, localized noise.
...and 6 more figures

FedNoisy: Federated Noisy Label Learning Benchmark

TL;DR

Abstract

FedNoisy: Federated Noisy Label Learning Benchmark

Authors

TL;DR

Abstract

Table of Contents

Figures (11)