The Federation Strikes Back: A Survey of Federated Learning Privacy Attacks, Defenses, Applications, and Policy Landscape
Joshua C. Zhao, Saurabh Bagchi, Salman Avestimehr, Kevin S. Chan, Somali Chaterji, Dimitris Dimitriadis, Jiacheng Li, Ninghui Li, Arash Nourian, Holger R. Roth
TL;DR
This survey tackles the privacy challenges of Federated Learning by cataloging attacks (data reconstruction, membership inference, property inference, and model extraction) and defenses (differential privacy, secure aggregation, and homomorphic encryption) across FL variants (cross-device, cross-silo, horizontal, vertical, hierarchical, and FL-derived forms). It grounds technical discussions in concrete application domains (healthcare, finance, IoT/edge) and examines the evolving policy landscape (GDPR, US privacy laws, AI-related acts) shaping deployment. A central contribution is the synthesis of attack-defense-tPolicy triad with real-world deployment insights and open questions, highlighting the need for certifiable protections, realistic utility-privacy tradeoffs, and robust handling of non-IID data and heterogeneous devices. The work emphasizes that, despite FL’s privacy-by-design promise, sophisticated attacks persist, motivating multi-layer defenses and regulatory alignment to enable practical, privacy-preserving FL in safety- and privacy-critical applications.
Abstract
Deep learning has shown incredible potential across a wide array of tasks, and accompanied by this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices, and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology that enables collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be "reverse engineered" to infer information about the private training data. It has been shown under a wide variety of settings that this privacy premise does not hold. In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which the privacy of an FL client can be broken. We further dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL and conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.
