Table of Contents
Fetching ...

Privacy-preserving Quantification of Non-IID Degree in Federated Learning

Yuping Yan, Yizhi Wang, Yingchao Yu, Yaochu Jin

TL;DR

This paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function (CDF), called Fully Homomorphic Encryption-based Federated Cumulative Distribution Function (FHE-FCDF).

Abstract

Federated learning (FL) offers a privacy-preserving approach to machine learning for multiple collaborators without sharing raw data. However, the existence of non-independent and non-identically distributed (non-IID) datasets across different clients presents a significant challenge to FL, leading to a sharp drop in accuracy, reduced efficiency, and hindered implementation. To address the non-IID problem, various methods have been proposed, including clustering and personalized FL frameworks. Nevertheless, to date, a formal quantitative definition of the non-IID degree between different clients' datasets is still missing, hindering the clients from comparing and obtaining an overview of their data distributions with other clients. For the first time, this paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function (CDF), called Fully Homomorphic Encryption-based Federated Cumulative Distribution Function (FHE-FCDF). This method utilizes cryptographic primitive fully homomorphic encryption to enable clients to estimate the non-IID degree while ensuring privacy preservation. The experiments conducted on the CIFAR-100 non-IID dataset validate the effectiveness of our proposed method.

Privacy-preserving Quantification of Non-IID Degree in Federated Learning

TL;DR

This paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function (CDF), called Fully Homomorphic Encryption-based Federated Cumulative Distribution Function (FHE-FCDF).

Abstract

Federated learning (FL) offers a privacy-preserving approach to machine learning for multiple collaborators without sharing raw data. However, the existence of non-independent and non-identically distributed (non-IID) datasets across different clients presents a significant challenge to FL, leading to a sharp drop in accuracy, reduced efficiency, and hindered implementation. To address the non-IID problem, various methods have been proposed, including clustering and personalized FL frameworks. Nevertheless, to date, a formal quantitative definition of the non-IID degree between different clients' datasets is still missing, hindering the clients from comparing and obtaining an overview of their data distributions with other clients. For the first time, this paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function (CDF), called Fully Homomorphic Encryption-based Federated Cumulative Distribution Function (FHE-FCDF). This method utilizes cryptographic primitive fully homomorphic encryption to enable clients to estimate the non-IID degree while ensuring privacy preservation. The experiments conducted on the CIFAR-100 non-IID dataset validate the effectiveness of our proposed method.
Paper Structure (13 sections, 5 equations, 8 figures, 1 algorithm)

This paper contains 13 sections, 5 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: The overview of federated learning framework.
  • Figure 2: Weight divergence of FL with IID and non-IID data.
  • Figure 3: CDF vs. empirical CDF.
  • Figure 4: An overview of the FHE-FCDF approach consisting of one server and multiple clients. ①: All clients sent the variable $x$ to the server. ②: The server aggregates all $x$ variables. ③: The server sends the distribution policy of the variable $x$ to all the clients.④: Clients conduct CDF based on the distribution policy and their local database. ⑤: Encrypt the CDF values with the encryption key of the fully homomorphic encryption scheme. ⑥: All the clients send the encrypted CDF values to the server in the ciphertext. ⑦: The server conducts the fully homomorphic evaluation based on these values and gets the aggregation CDF (central CDF). ⑧: The server returns the aggregated CDF to all the clients. ⑨: Each client decrypts the aggregated CDF with the decryption key of the fully homomorphic encryption scheme. ①0: The client compares the central CDF with the local CDF and quantifies the non-IID degree.
  • Figure 5: The fully homomorphic evaluation process
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 1: Federated Learning
  • Definition 2: Homomorphic Encryption