Distribution-Free Fair Federated Learning with Small Samples

Qichuan Yin; Zexian Wang; Junzhou Huang; Huaxiu Yao; Linjun Zhang

Distribution-Free Fair Federated Learning with Small Samples

Qichuan Yin, Zexian Wang, Junzhou Huang, Huaxiu Yao, Linjun Zhang

TL;DR

This work tackles fairness in federated learning under finite-sample, distribution-free constraints by introducing FedFaiREE, a post-processing method that uses order statistics and distributed quantile sketches to select group-specific thresholds ensuring $|DEOO|<\alpha$ with high probability. The approach separates candidate-set construction from threshold selection, providing theoretical guarantees that the fairness constraint holds with probability at least $(1-\delta)^N$ while achieving near-Bayes-optimal accuracy when the base predictor is close to optimal. It further extends to scenarios with label shift, Equalized Odds, and multi-group settings, and demonstrates robust empirical performance on real datasets (Adult, Compas, ACSIncome) against several baselines. The combination of finite-sample, distribution-free guarantees, and decentralized compatibility makes FedFaiREE a practical tool for fair decision-making in heterogeneous FL deployments.

Abstract

As federated learning gains increasing importance in real-world applications due to its capacity for decentralized data training, addressing fairness concerns across demographic groups becomes critically important. However, most existing machine learning algorithms for ensuring fairness are designed for centralized data environments and generally require large-sample and distributional assumptions, underscoring the urgent need for fairness techniques adapted for decentralized and heterogeneous systems with finite-sample and distribution-free guarantees. To address this issue, this paper introduces FedFaiREE, a post-processing algorithm developed specifically for distribution-free fair learning in decentralized settings with small samples. Our approach accounts for unique challenges in decentralized environments, such as client heterogeneity, communication costs, and small sample sizes. We provide rigorous theoretical guarantees for both fairness and accuracy, and our experimental results further provide robust empirical validation for our proposed method.

Distribution-Free Fair Federated Learning with Small Samples

TL;DR

with high probability. The approach separates candidate-set construction from threshold selection, providing theoretical guarantees that the fairness constraint holds with probability at least

while achieving near-Bayes-optimal accuracy when the base predictor is close to optimal. It further extends to scenarios with label shift, Equalized Odds, and multi-group settings, and demonstrates robust empirical performance on real datasets (Adult, Compas, ACSIncome) against several baselines. The combination of finite-sample, distribution-free guarantees, and decentralized compatibility makes FedFaiREE a practical tool for fair decision-making in heterogeneous FL deployments.

Abstract

Paper Structure (36 sections, 25 theorems, 88 equations, 12 figures, 9 tables, 7 algorithms)

This paper contains 36 sections, 25 theorems, 88 equations, 12 figures, 9 tables, 7 algorithms.

Introduction
Additional Related Work
Preliminaries
Enabling Fair Federated Learning
Problem formulation
Candidate set construction with distributed quantile algorithm
Selection for the optimal threshold
Theoretical Guarantees
Extension to Different Scenarios
Label Shift in Test Set
Equalized Odds
Extension to Multi-Groups
Experiments
Conclusion
Proofs
...and 21 more sections

Key Result

Proposition 3.2

Under Assumption ass:mix, for $a \in \{0,1\}$, consider $k^{1, a} \in \{1, \ldots, n^{1, a}\}$, the corresponding $k_i^{1,a}$ for $i \in [S]$ and the score-based classifier $\phi(x, a) = \mathbbm{1}\{f(x, a) > t_{(k^{1, a})}^{1, a}\}$. Define Then we have: where $\pi_i^{1,a} = \mathbb{P}( x \text{ from client } i \mid x \text{ with } Y=1, A=a)$.

Figures (12)

Figure 1: The distribution of $|DEOO|$ (a fairness metric, defined in Equation \ref{['deoo']}) for FedAvg Fedavg and FairFed FairFed, both with and without FedFaiREE, evaluated on the Adult dataset dua2017uci. See Section \ref{['sec:experiment']} for experiment details.
Figure 2: Overview of FedFaiREE. With S clients and a pre-trained model in consideration, each circle in the image symbolizes a datapoint score in the training set. The color of the circles represents different sensitive labels, while the gray edges depict local ranks of threshold pairs (each global classifier's threshold pair corresponds to S local ranks). Notably, the red edge signifies the chosen global classifier with thresholds ${t^{*,0}, t^{*,1}}$ for sensitive labels $A=0$ and $A=1$, respectively.
Figure : (a)
Figure : (a)
Figure : (a)
...and 7 more figures

Theorems & Definitions (46)

Definition 2.1
Definition 2.2
Proposition 3.2
Definition 3.3
Proposition 3.4
Proposition 3.5
Theorem 4.2
Theorem 5.2
Definition 5.3
Theorem 5.4
...and 36 more

Distribution-Free Fair Federated Learning with Small Samples

TL;DR

Abstract

Distribution-Free Fair Federated Learning with Small Samples

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (46)