Table of Contents
Fetching ...

AFed: Algorithmic Fair Federated Learning

Huiqiang Chen, Tianqing Zhu, Wanlei Zhou, Wei Zhao

TL;DR

Two approaches are proposed: AFed-G, which uses a conditional generator trained on the server side, and AFed-GAN, which improves upon AFed-G by training a conditional GAN on the client side, and augment the client data with the generated samples to help remove bias.

Abstract

Federated Learning (FL) has gained significant attention as it facilitates collaborative machine learning among multiple clients without centralizing their data on a server. FL ensures the privacy of participating clients by locally storing their data, which creates new challenges in fairness. Traditional debiasing methods assume centralized access to sensitive information, rendering them impractical for the FL setting. Additionally, FL is more susceptible to fairness issues than centralized machine learning due to the diverse client data sources that may be associated with group information. Therefore, training a fair model in FL without access to client local data is important and challenging. This paper presents AFed, a straightforward yet effective framework for promoting group fairness in FL. The core idea is to circumvent restricted data access by learning the global data distribution. This paper proposes two approaches: AFed-G, which uses a conditional generator trained on the server side, and AFed-GAN, which improves upon AFed-G by training a conditional GAN on the client side. We augment the client data with the generated samples to help remove bias. Our theoretical analysis justifies the proposed methods, and empirical results on multiple real-world datasets demonstrate a substantial improvement in AFed over several baselines.

AFed: Algorithmic Fair Federated Learning

TL;DR

Two approaches are proposed: AFed-G, which uses a conditional generator trained on the server side, and AFed-GAN, which improves upon AFed-G by training a conditional GAN on the client side, and augment the client data with the generated samples to help remove bias.

Abstract

Federated Learning (FL) has gained significant attention as it facilitates collaborative machine learning among multiple clients without centralizing their data on a server. FL ensures the privacy of participating clients by locally storing their data, which creates new challenges in fairness. Traditional debiasing methods assume centralized access to sensitive information, rendering them impractical for the FL setting. Additionally, FL is more susceptible to fairness issues than centralized machine learning due to the diverse client data sources that may be associated with group information. Therefore, training a fair model in FL without access to client local data is important and challenging. This paper presents AFed, a straightforward yet effective framework for promoting group fairness in FL. The core idea is to circumvent restricted data access by learning the global data distribution. This paper proposes two approaches: AFed-G, which uses a conditional generator trained on the server side, and AFed-GAN, which improves upon AFed-G by training a conditional GAN on the client side. We augment the client data with the generated samples to help remove bias. Our theoretical analysis justifies the proposed methods, and empirical results on multiple real-world datasets demonstrate a substantial improvement in AFed over several baselines.
Paper Structure (33 sections, 4 theorems, 32 equations, 9 figures, 1 algorithm)

This paper contains 33 sections, 4 theorems, 32 equations, 9 figures, 1 algorithm.

Key Result

Theorem 1

Given the optimal classification head $h^y$, optimizing Eq. eq: AFed-G-opt is equivalent to where $\prod\limits_{k=1}^N{P^k_{A,Z}}$ and $\prod\limits_{k=1}^N{P^k_{Z}}$ are the joint probability distribution of clients' data. $H_Q(A\|Z)=\mathbb{E}_{a,z\sim Q_{A,Z}}[\log q(a|z)]$ is the conditional entropy of the generated samples.

Figures (9)

  • Figure 1: The data distribution of each client in the toy example. Data is sampled from a mixture of four Gaussian distributions. For each client, 85% of the data is sampled from a single Gaussian distribution, and the rest 15% is evenly sampled from the other three Gaussian distributions.
  • Figure 2: (a) The global distribution of all clients' data; (b) The distribution of generated data, the generated data is distributed in the same space as the real data.
  • Figure 3: An overview of AFed Framework. The key idea is first to extract clients' local distributions and obtain a view of global distribution via a conditional generator $G$, which is then shared with all clients to help local debiasing.
  • Figure 4: (a) One classification head case, the extractor $E_k$ is trained solely with the feedback of $h^y_k$. Samples with the same label $y=1$ but different attributes $a=0$ and $a=1$ are mapped to the same area in latent space. (b) Two classification heads case, $E_k$ is now trained with both heads' feedback. Samples are better separated by $y$ and $a$.
  • Figure 5: The UMAP mcinnes2018umap of fake and real features. (a) One classification head case; (b) Two classification heads case.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Definition 1: Demographic parity hardt2016equality
  • Theorem 1
  • proof
  • Lemma 1
  • Theorem 2
  • proof
  • Remark 1
  • proof
  • Theorem 3
  • proof