Making Privacy-preserving Federated Graph Analytics with Strong Guarantees Practical (for Certain Queries)
Kunlong Liu, Trinabh Gupta
TL;DR
Colo tackles privacy-preserving federated graph analytics under a strong malicious-threat model by separating computation across a set of servers and devices connected through a metadata-hiding network. It introduces a tailored secure computation protocol that enumerates bounded-input outputs per neighbor, uses OT for private selection, and employs zero-knowledge proofs to ensure bounded results, all within a metadata-hiding topology to protect graph structure. Global aggregation relies on additive secret sharing across servers, ensuring final results are revealed only to the analyst as long as at least one server remains honest. Empirically, Colo achieves orders-of-magnitude improvements in device-side CPU and network costs compared to prior work (e.g., Mycelium) and demonstrates practical deployment potential for large-scale epidemiological queries, albeit with restricted query generality.
Abstract
Privacy-preserving federated graph analytics is an emerging area of research. The goal is to run graph analytics queries over a set of devices that are organized as a graph while keeping the raw data on the devices rather than centralizing it. Further, no entity may learn any new information except for the final query result. For instance, a device may not learn a neighbor's data. The state-of-the-art prior work for this problem provides privacy guarantees for a broad set of queries in a strong threat model where the devices can be malicious. However, it imposes an impractical overhead: each device locally requires over 8.79 hours of cpu time and 5.73 GiBs of network transfers per query. This paper presents Colo, a new, low-cost system for privacy-preserving federated graph analytics that requires minutes of cpu time and a few MiBs in network transfers, for a particular subset of queries. At the heart of Colo is a new secure computation protocol that enables a device to securely and efficiently evaluate a graph query in its local neighborhood while hiding device data, edge data, and topology data. An implementation and evaluation of Colo shows that for running a variety of COVID-19 queries over a population of 1M devices, it requires less than 8.4 minutes of a device's CPU time and 4.93 MiBs in network transfers - improvements of up to three orders of magnitude.
