Table of Contents
Fetching ...

FedPCA: Noise-Robust Fair Federated Learning via Performance-Capacity Analysis

Nannan Wu, Zengqiang Yan, Nong Sang, Li Yu, Chang Wen Chen

TL;DR

This work targets robust fairness in federated learning under label noise by introducing performance-capacity analysis. It jointly considers model performance and dataset handling capacity through the loss $\ell_t(\bar{D}_k)$ and a dispersion score $S_t(\bar{D}_k)$, enabling reliable identification of mislabeled clients via a Gaussian Mixture Model on $(\ell_t, S_t)$. FedPCA then discards or selectively uses data from mislabeled clients (via Drop or High-Confidence Sampling) and adjusts global aggregation with weights $w_{t,k}$ that depend on reliable data $\hat{N}_{t,k}$, label confidence $r_{t,k}$, and dispersion, thereby balancing robustness and fairness. Empirical results on CIFAR-10, RSNA ICH, and ISIC 2019 show that FedPCA consistently outperforms baselines, validating its effectiveness and suggesting practical utility for trustworthy FL in heterogeneous, privacy-preserving settings. Code availability is promised upon acceptance.

Abstract

Training a model that effectively handles both common and rare data-i.e., achieving performance fairness-is crucial in federated learning (FL). While existing fair FL methods have shown effectiveness, they remain vulnerable to mislabeled data. Ensuring robustness in fair FL is therefore essential. However, fairness and robustness inherently compete, which causes robust strategies to hinder fairness. In this paper, we attribute this competition to the homogeneity in loss patterns exhibited by rare and mislabeled data clients, preventing existing loss-based fair and robust FL methods from effectively distinguishing and handling these two distinct client types. To address this, we propose performance-capacity analysis, which jointly considers model performance on each client and its capacity to handle the dataset, measured by loss and a newly introduced feature dispersion score. This allows mislabeled clients to be identified by their significantly deviated performance relative to capacity while preserving rare data clients. Building on this, we introduce FedPCA, an FL method that robustly achieves fairness. FedPCA first identifies mislabeled clients via a Gaussian Mixture Model on loss-dispersion pairs, then applies fairness and robustness strategies in global aggregation and local training by adjusting client weights and selectively using reliable data. Extensive experiments on three datasets demonstrate FedPCA's effectiveness in tackling this complex challenge. Code will be publicly available upon acceptance.

FedPCA: Noise-Robust Fair Federated Learning via Performance-Capacity Analysis

TL;DR

This work targets robust fairness in federated learning under label noise by introducing performance-capacity analysis. It jointly considers model performance and dataset handling capacity through the loss and a dispersion score , enabling reliable identification of mislabeled clients via a Gaussian Mixture Model on . FedPCA then discards or selectively uses data from mislabeled clients (via Drop or High-Confidence Sampling) and adjusts global aggregation with weights that depend on reliable data , label confidence , and dispersion, thereby balancing robustness and fairness. Empirical results on CIFAR-10, RSNA ICH, and ISIC 2019 show that FedPCA consistently outperforms baselines, validating its effectiveness and suggesting practical utility for trustworthy FL in heterogeneous, privacy-preserving settings. Code availability is promised upon acceptance.

Abstract

Training a model that effectively handles both common and rare data-i.e., achieving performance fairness-is crucial in federated learning (FL). While existing fair FL methods have shown effectiveness, they remain vulnerable to mislabeled data. Ensuring robustness in fair FL is therefore essential. However, fairness and robustness inherently compete, which causes robust strategies to hinder fairness. In this paper, we attribute this competition to the homogeneity in loss patterns exhibited by rare and mislabeled data clients, preventing existing loss-based fair and robust FL methods from effectively distinguishing and handling these two distinct client types. To address this, we propose performance-capacity analysis, which jointly considers model performance on each client and its capacity to handle the dataset, measured by loss and a newly introduced feature dispersion score. This allows mislabeled clients to be identified by their significantly deviated performance relative to capacity while preserving rare data clients. Building on this, we introduce FedPCA, an FL method that robustly achieves fairness. FedPCA first identifies mislabeled clients via a Gaussian Mixture Model on loss-dispersion pairs, then applies fairness and robustness strategies in global aggregation and local training by adjusting client weights and selectively using reliable data. Extensive experiments on three datasets demonstrate FedPCA's effectiveness in tackling this complex challenge. Code will be publicly available upon acceptance.

Paper Structure

This paper contains 25 sections, 5 equations, 10 figures, 12 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) Our setting involves both fairness and robustness challenges. The former arises from imbalanced data scales across diverse data distributions, while the latter stems from clients with mislabeled data. (b) In the absence of mislabeled data, fairness can be improved using existing fair FL methods like FedISM. However, these methods remain highly vulnerable to mislabeled data, even when combined with state-of-the-art robust FL techniques FedNoRo. Our proposed method effectively addresses this hybrid challenge, achieving fairness in a robust manner. Results are obtained from experiments conducted on RSNA ICH.
  • Figure 2: Comparison of aggregation weights assigned to rare and mislabeled data clients for vanilla FL FedAvg, fair FL FedISM, and its combination with robust FL FedNoRo. The transparent area represents the standard deviation. Experiments are conducted on RSNA ICH.
  • Figure 3: Visualization of performance-capacity analysis.
  • Figure 4: Aggregation weights assigned to clients over communication rounds, with the transparent area indicating the standard deviation. The legend in the first figure remains consistent across both figures. Experiments are conducted on RSNA ICH.
  • Figure 5: Aggregation weights assigned to clients over communication rounds, with the transparent area indicating the standard deviation. The legend in the first figure remains consistent across all figures.
  • ...and 5 more figures