Table of Contents
Fetching ...

A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues

Zibo Wang, Haichao Ji, Yifei Zhu, Dan Wang, Zhu Han

TL;DR

Federated Analytics (FA) addresses privacy-preserving, distributed data analytics on edge data, distinguishing itself from Federated Learning by focusing on descriptive analytics and data mining tasks rather than model training. It formalizes FA as computing $F(x_1,...,x_n)$ via local insights $I_i(x_i)$ and a global aggregator $A$, while raw data remain private, and supports one-shot or iterative executions. The survey delivers a five-dimensional taxonomy (task, data-owner type, iteration pattern, coordination model, privatization methodology), reviews enabling techniques (LDP, CDP, DDP, k-anonymity, HE, MPC, sketching, data structures, optimization, and incentive design), and surveys applications across statistical metrics, frequency tasks, database operations, FL assistance, and wireless networks. It also discusses open issues and future directions toward more complex data scenarios, unified frameworks, privacy at scale, formal privacy metrics, system efficiency, and cross-layer design, aiming to guide future FA research and deployment in privacy-sensitive, distributed settings.

Abstract

The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has restricted the traditional data analytics workflow, where the edge data are gathered by a centralized server to be further utilized by data analysts. To continue leveraging vast edge data to support various data-incentive applications, computing paradigms have promoted a transformative shift from centralized data processing to privacy-preserved distributed data processing. The need to perform data analytics on private edge data motivates federated analytics (FA), an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data. Despite the wide applications of FA in industry and academia, a comprehensive examination of existing research efforts in FA has been notably absent. This survey aims to bridge this gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts. We then thoroughly examine FA, including its key challenges, taxonomy, and enabling techniques. Diverse FA applications, including statistical metrics, frequency-related applications, database query operations, FL-assisting FA tasks, and other wireless network applications are then carefully reviewed. We complete the survey with several open research issues, future directions, and a comprehensive lessons learned part. This survey intends to provide a holistic understanding of the emerging FA techniques and foster the continued evolution of privacy-preserving distributed data processing in the emerging networked society.

A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues

TL;DR

Federated Analytics (FA) addresses privacy-preserving, distributed data analytics on edge data, distinguishing itself from Federated Learning by focusing on descriptive analytics and data mining tasks rather than model training. It formalizes FA as computing via local insights and a global aggregator , while raw data remain private, and supports one-shot or iterative executions. The survey delivers a five-dimensional taxonomy (task, data-owner type, iteration pattern, coordination model, privatization methodology), reviews enabling techniques (LDP, CDP, DDP, k-anonymity, HE, MPC, sketching, data structures, optimization, and incentive design), and surveys applications across statistical metrics, frequency tasks, database operations, FL assistance, and wireless networks. It also discusses open issues and future directions toward more complex data scenarios, unified frameworks, privacy at scale, formal privacy metrics, system efficiency, and cross-layer design, aiming to guide future FA research and deployment in privacy-sensitive, distributed settings.

Abstract

The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has restricted the traditional data analytics workflow, where the edge data are gathered by a centralized server to be further utilized by data analysts. To continue leveraging vast edge data to support various data-incentive applications, computing paradigms have promoted a transformative shift from centralized data processing to privacy-preserved distributed data processing. The need to perform data analytics on private edge data motivates federated analytics (FA), an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data. Despite the wide applications of FA in industry and academia, a comprehensive examination of existing research efforts in FA has been notably absent. This survey aims to bridge this gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts. We then thoroughly examine FA, including its key challenges, taxonomy, and enabling techniques. Diverse FA applications, including statistical metrics, frequency-related applications, database query operations, FL-assisting FA tasks, and other wireless network applications are then carefully reviewed. We complete the survey with several open research issues, future directions, and a comprehensive lessons learned part. This survey intends to provide a holistic understanding of the emerging FA techniques and foster the continued evolution of privacy-preserving distributed data processing in the emerging networked society.
Paper Structure (65 sections, 9 equations, 10 figures, 10 tables)

This paper contains 65 sections, 9 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Overview of the survey.
  • Figure 2: Architecture of FA. The left plot demonstrates the interaction between the server and the clients. The clients holding privacy-sensitive local data utilizes the computation procedure of insight extractor to derive privacy-preserving insight, which is aggregated on the server side to derive results. The right plot demonstrates the iterative workflow of FA, where an FA iteration consists of phases of computation model distribution, local computation, insight upload, and insight aggregation.
  • Figure 3: Relationship of FA and related topics.
  • Figure 4: An illustration of our taxonomy.
  • Figure 5: Summery of DP variations, their enabling techniques, and existing works applying them. Sampling is not only able to solely satisfy CDP, but also can enhance the privacy preservation of any existing CDP/LDP scheme. Applications following the latter approach are marked red in the figure.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 1: Local differential privacy
  • Definition 2: Central differential privacy