Table of Contents
Fetching ...

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Yuyi Mao, Jun Zhang

TL;DR

This paper addresses the question of what to share in federated learning beyond model parameters by proposing a taxonomy of sharing modalities: model parameters, synthetic data, and knowledge. It provides a systematic review of how each modality impacts model utility, privacy leakage, and communication efficiency, supported by experiments across SVHN and CIFAR-10 with non-IID data. The authors analyze privacy attacks (gradients, parameters, logits, and intermediate features) and defenses (cryptography-based and perturbation-based), and demonstrate the tradeoffs with empirical results and Pareto front analyses. The work highlights limitations of predominantly parameter-sharing FL and suggests hybrid aggregation and privacy-preserving data generation as promising directions for future research.

Abstract

Federated learning (FL) has emerged as a secure paradigm for collaborative training among clients. Without data centralization, FL allows clients to share local information in a privacy-preserving manner. This approach has gained considerable attention, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on FL methods that share model parameters during the training process, while overlooking the possibility of sharing local information in other forms. In this paper, we present a systematic survey from a new perspective of what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. First, we present a new taxonomy of FL methods in terms of three sharing methods, which respectively share model, synthetic data, and knowledge. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms. Third, we conduct extensive experiments to compare the learning performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we identify future research directions and conclude the survey.

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

TL;DR

This paper addresses the question of what to share in federated learning beyond model parameters by proposing a taxonomy of sharing modalities: model parameters, synthetic data, and knowledge. It provides a systematic review of how each modality impacts model utility, privacy leakage, and communication efficiency, supported by experiments across SVHN and CIFAR-10 with non-IID data. The authors analyze privacy attacks (gradients, parameters, logits, and intermediate features) and defenses (cryptography-based and perturbation-based), and demonstrate the tradeoffs with empirical results and Pareto front analyses. The work highlights limitations of predominantly parameter-sharing FL and suggests hybrid aggregation and privacy-preserving data generation as promising directions for future research.

Abstract

Federated learning (FL) has emerged as a secure paradigm for collaborative training among clients. Without data centralization, FL allows clients to share local information in a privacy-preserving manner. This approach has gained considerable attention, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on FL methods that share model parameters during the training process, while overlooking the possibility of sharing local information in other forms. In this paper, we present a systematic survey from a new perspective of what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. First, we present a new taxonomy of FL methods in terms of three sharing methods, which respectively share model, synthetic data, and knowledge. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms. Third, we conduct extensive experiments to compare the learning performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we identify future research directions and conclude the survey.
Paper Structure (47 sections, 8 figures, 7 tables)

This paper contains 47 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: An overview of different sharing methods in FL.
  • Figure 2: Overview of the survey.
  • Figure 3: Two types of privacy attacks in FL: (Left) passive attack and (Right) active attack.
  • Figure 4: Defense methods against privacy attacks in FL: (Left) cryptography-based technique for model sharing and (Right) perturbation method for knowledge sharing.
  • Figure 5: Accuracy and privacy leakage in SVHN image classification task. Local datasets follow Dirichlet distribution with (a) $\alpha = 0.01$ and (b) $\alpha = 0.1$.
  • ...and 3 more figures