Common Neighborhood Estimation over Bipartite Graphs under Local Differential Privacy
Yizhang He, Kai Wang, Wenjie Zhang, Xuemin Lin, Ying Zhang
TL;DR
This work tackles the problem of privately estimating the number of common neighbors $\mathcal{C}_2(u,w)$ between two vertices on the same layer in bipartite graphs under $\varepsilon$-edge LDP. It starts from a naive baseline and progressively designs unbiased estimators: OneR to counteract overcounting in a single round, followed by a multi-round framework with single-source (MultiR-SS) and double-source (MultiR-DS) estimators that significantly reduce variance by restricting the candidate pool and optimally allocating privacy budget. The proposed estimators are proven unbiased and their L2 losses are analyzed, with experiments on 15 real datasets showing that MultiR-SS/DS achieve substantially lower errors than Naive/OneR, and MultiR-DS is particularly robust to degree imbalances. The methods enable accurate, privacy-preserving common-neighborhood computations that can feed into downstream tasks like similarity measures and biclique counting, while maintaining practical efficiency in time and communication under edge LDP.
Abstract
Bipartite graphs, formed by two vertex layers, arise as a natural fit for modeling the relationships between two groups of entities. In bipartite graphs, common neighborhood computation between two vertices on the same vertex layer is a basic operator, which is easily solvable in general settings. However, it inevitably involves releasing the neighborhood information of vertices, posing a significant privacy risk for users in real-world applications. To protect edge privacy in bipartite graphs, in this paper, we study the problem of estimating the number of common neighbors of two vertices on the same layer under edge local differential privacy (edge LDP). The problem is challenging in the context of edge LDP since each vertex on the opposite layer of the query vertices can potentially be a common neighbor. To obtain efficient and accurate estimates, we propose a multiple-round framework that significantly reduces the candidate pool of common neighbors and enables the query vertices to construct unbiased estimators locally. Furthermore, we improve data utility by incorporating the estimators built from the neighbors of both query vertices and devise privacy budget allocation optimizations. These improve the estimator's robustness and consistency, particularly against query vertices with imbalanced degrees. Extensive experiments on 15 datasets validate the effectiveness and efficiency of our proposed techniques.
