Efficient and Privacy-Preserved Link Prediction via Condensed Graphs
Yunbo Long, Liming Xu, Alexandra Brintrup
TL;DR
This work tackles privacy-preserving link prediction on large, sensitive networks by introducing HyDRO+, a graph condensation method guided by algebraic Jaccard similarity. HyDRO+ selects structurally central nodes, embeds them in hyperbolic space, and uses gradient matching to ensure condensed graphs preserve locality and spectral properties critical for link prediction, while reducing data exposure. Across four real-world datasets, HyDRO+ achieves near-original LP accuracy with substantial gains in training speed and storage efficiency and provides strong privacy protections against membership-inference attacks. The approach enables secure, scalable sharing of link-prediction insights for inter-organizational collaboration in privacy-sensitive domains such as supply chains and product networks.
Abstract
Link prediction is crucial for uncovering hidden connections within complex networks, enabling applications such as identifying potential customers and products. However, this research faces significant challenges, including concerns about data privacy, as well as high computational and storage costs, especially when dealing with large-scale networks. Condensed graphs, which are much smaller than the original graphs while retaining essential information, has become an effective solution to both maintain data utility and preserve privacy. Existing methods, however, initialize synthetic graphs through random node selection without considering node connectivity, and are mainly designed for node classification tasks. As a result, their potential for privacy-preserving link prediction remains largely unexplored. We introduce HyDRO\textsuperscript{+}, a graph condensation method guided by algebraic Jaccard similarity, which leverages local connectivity information to optimize condensed graph structures. Extensive experiments on four real-world networks show that our method outperforms state-of-the-art methods and even the original networks in balancing link prediction accuracy and privacy preservation. Moreover, our method achieves nearly 20* faster training and reduces storage requirements by 452*, as demonstrated on the Computers dataset, compared to link prediction on the original networks. This work represents the first attempt to leverage condensed graphs for privacy-preserving link prediction information sharing in real-world complex networks. It offers a promising pathway for preserving link prediction information while safeguarding privacy, advancing the use of graph condensation in large-scale networks with privacy concerns.
