Table of Contents
Fetching ...

Making Privacy-preserving Federated Graph Analytics with Strong Guarantees Practical (for Certain Queries)

Kunlong Liu, Trinabh Gupta

TL;DR

Colo tackles privacy-preserving federated graph analytics under a strong malicious-threat model by separating computation across a set of servers and devices connected through a metadata-hiding network. It introduces a tailored secure computation protocol that enumerates bounded-input outputs per neighbor, uses OT for private selection, and employs zero-knowledge proofs to ensure bounded results, all within a metadata-hiding topology to protect graph structure. Global aggregation relies on additive secret sharing across servers, ensuring final results are revealed only to the analyst as long as at least one server remains honest. Empirically, Colo achieves orders-of-magnitude improvements in device-side CPU and network costs compared to prior work (e.g., Mycelium) and demonstrates practical deployment potential for large-scale epidemiological queries, albeit with restricted query generality.

Abstract

Privacy-preserving federated graph analytics is an emerging area of research. The goal is to run graph analytics queries over a set of devices that are organized as a graph while keeping the raw data on the devices rather than centralizing it. Further, no entity may learn any new information except for the final query result. For instance, a device may not learn a neighbor's data. The state-of-the-art prior work for this problem provides privacy guarantees for a broad set of queries in a strong threat model where the devices can be malicious. However, it imposes an impractical overhead: each device locally requires over 8.79 hours of cpu time and 5.73 GiBs of network transfers per query. This paper presents Colo, a new, low-cost system for privacy-preserving federated graph analytics that requires minutes of cpu time and a few MiBs in network transfers, for a particular subset of queries. At the heart of Colo is a new secure computation protocol that enables a device to securely and efficiently evaluate a graph query in its local neighborhood while hiding device data, edge data, and topology data. An implementation and evaluation of Colo shows that for running a variety of COVID-19 queries over a population of 1M devices, it requires less than 8.4 minutes of a device's CPU time and 4.93 MiBs in network transfers - improvements of up to three orders of magnitude.

Making Privacy-preserving Federated Graph Analytics with Strong Guarantees Practical (for Certain Queries)

TL;DR

Colo tackles privacy-preserving federated graph analytics under a strong malicious-threat model by separating computation across a set of servers and devices connected through a metadata-hiding network. It introduces a tailored secure computation protocol that enumerates bounded-input outputs per neighbor, uses OT for private selection, and employs zero-knowledge proofs to ensure bounded results, all within a metadata-hiding topology to protect graph structure. Global aggregation relies on additive secret sharing across servers, ensuring final results are revealed only to the analyst as long as at least one server remains honest. Empirically, Colo achieves orders-of-magnitude improvements in device-side CPU and network costs compared to prior work (e.g., Mycelium) and demonstrates practical deployment potential for large-scale epidemiological queries, albeit with restricted query generality.

Abstract

Privacy-preserving federated graph analytics is an emerging area of research. The goal is to run graph analytics queries over a set of devices that are organized as a graph while keeping the raw data on the devices rather than centralizing it. Further, no entity may learn any new information except for the final query result. For instance, a device may not learn a neighbor's data. The state-of-the-art prior work for this problem provides privacy guarantees for a broad set of queries in a strong threat model where the devices can be malicious. However, it imposes an impractical overhead: each device locally requires over 8.79 hours of cpu time and 5.73 GiBs of network transfers per query. This paper presents Colo, a new, low-cost system for privacy-preserving federated graph analytics that requires minutes of cpu time and a few MiBs in network transfers, for a particular subset of queries. At the heart of Colo is a new secure computation protocol that enables a device to securely and efficiently evaluate a graph query in its local neighborhood while hiding device data, edge data, and topology data. An implementation and evaluation of Colo shows that for running a variety of COVID-19 queries over a population of 1M devices, it requires less than 8.4 minutes of a device's CPU time and 4.93 MiBs in network transfers - improvements of up to three orders of magnitude.
Paper Structure (23 sections, 9 figures)

This paper contains 23 sections, 9 figures.

Figures (9)

  • Figure 1: Example graph queries from Mycelium roth2021mycelium and the literature on health analytics park2020contactnikolay2019transmissionmossong2008sociallaxminarayan2020epidemiologyjing2020householdgudbjartsson2020spreaddanon2013socialbi2020epidemiologyadam2020clustering. We assume that the domain of the inputs to these queries is bounded, for example, $inf \in [0,1]$ and $tinf \in [1,120]$, referring to the days in the latest few months.
  • Figure 2: An overview of Colo's query distribution, local aggregation, and global aggregation phases of query execution. The dotted line in local aggregation depicts metadata-hiding communication, and the dashed line depicts secure computation.
  • Figure 3: A high-level description of Colo's phases.
  • Figure 4: Colo's local aggregation.
  • Figure 5: Commonly used graph datasets from the Stanford Large Network Dataset Collection (SNAP) snapnets.
  • ...and 4 more figures