A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency
Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Yuyi Mao, Jun Zhang
TL;DR
This paper addresses the question of what to share in federated learning beyond model parameters by proposing a taxonomy of sharing modalities: model parameters, synthetic data, and knowledge. It provides a systematic review of how each modality impacts model utility, privacy leakage, and communication efficiency, supported by experiments across SVHN and CIFAR-10 with non-IID data. The authors analyze privacy attacks (gradients, parameters, logits, and intermediate features) and defenses (cryptography-based and perturbation-based), and demonstrate the tradeoffs with empirical results and Pareto front analyses. The work highlights limitations of predominantly parameter-sharing FL and suggests hybrid aggregation and privacy-preserving data generation as promising directions for future research.
Abstract
Federated learning (FL) has emerged as a secure paradigm for collaborative training among clients. Without data centralization, FL allows clients to share local information in a privacy-preserving manner. This approach has gained considerable attention, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on FL methods that share model parameters during the training process, while overlooking the possibility of sharing local information in other forms. In this paper, we present a systematic survey from a new perspective of what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. First, we present a new taxonomy of FL methods in terms of three sharing methods, which respectively share model, synthetic data, and knowledge. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms. Third, we conduct extensive experiments to compare the learning performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we identify future research directions and conclude the survey.
