Table of Contents
Fetching ...

Privacy-Preserving Graph Embedding based on Local Differential Privacy

Zening Li, Rong-Hua Li, Meihao Liao, Fusheng Jin, Guoren Wang

TL;DR

This work tackles privacy in graph embedding by introducing PrivGE, a local-differential-privacy framework that privatizes high-dimensional node features using the HDS mechanism and decouples feature transformation from graph propagation. Embeddings are learned through a personalized PageRank-based propagation, enabling robust representations while preserving privacy even under stringent budgets $\epsilon$; the approach yields improved utility bounds and practical performance. Theoretical analysis shows utilities bounds of $\max_j|\tilde{z}_{v,j}-z_{v,j}| = O(\log(d/\delta))$, tightening over bounded mechanisms, and experiments on five real-world datasets demonstrate state-of-the-art results in node classification and link prediction under LDP. The work offers a significant step toward privacy-preserving graph learning in decentralized settings, with implications for social networks and other sensitive graph-structured data.

Abstract

Graph embedding has become a powerful tool for learning latent representations of nodes in a graph. Despite its superior performance in various graph-based machine learning tasks, serious privacy concerns arise when the graph data contains personal or sensitive information. To address this issue, we investigate and develop graph embedding algorithms that satisfy local differential privacy (LDP). We introduce a novel privacy-preserving graph embedding framework, named PrivGE, to protect node data privacy. Specifically, we propose an LDP mechanism to obfuscate node data and utilize personalized PageRank as the proximity measure to learn node representations. Furthermore, we provide a theoretical analysis of the privacy guarantees and utility offered by the PrivGE framework. Extensive experiments on several real-world graph datasets demonstrate that PrivGE achieves an optimal balance between privacy and utility, and significantly outperforms existing methods in node classification and link prediction tasks.

Privacy-Preserving Graph Embedding based on Local Differential Privacy

TL;DR

This work tackles privacy in graph embedding by introducing PrivGE, a local-differential-privacy framework that privatizes high-dimensional node features using the HDS mechanism and decouples feature transformation from graph propagation. Embeddings are learned through a personalized PageRank-based propagation, enabling robust representations while preserving privacy even under stringent budgets ; the approach yields improved utility bounds and practical performance. Theoretical analysis shows utilities bounds of , tightening over bounded mechanisms, and experiments on five real-world datasets demonstrate state-of-the-art results in node classification and link prediction under LDP. The work offers a significant step toward privacy-preserving graph learning in decentralized settings, with implications for social networks and other sensitive graph-structured data.

Abstract

Graph embedding has become a powerful tool for learning latent representations of nodes in a graph. Despite its superior performance in various graph-based machine learning tasks, serious privacy concerns arise when the graph data contains personal or sensitive information. To address this issue, we investigate and develop graph embedding algorithms that satisfy local differential privacy (LDP). We introduce a novel privacy-preserving graph embedding framework, named PrivGE, to protect node data privacy. Specifically, we propose an LDP mechanism to obfuscate node data and utilize personalized PageRank as the proximity measure to learn node representations. Furthermore, we provide a theoretical analysis of the privacy guarantees and utility offered by the PrivGE framework. Extensive experiments on several real-world graph datasets demonstrate that PrivGE achieves an optimal balance between privacy and utility, and significantly outperforms existing methods in node classification and link prediction tasks.
Paper Structure (26 sections, 8 theorems, 23 equations, 4 figures, 2 tables, 3 algorithms)

This paper contains 26 sections, 8 theorems, 23 equations, 4 figures, 2 tables, 3 algorithms.

Key Result

Proposition 2.2

Given the sequence of computations $\mathcal{A}_{1}, \mathcal{A}_{2}, \dots, \mathcal{A}_{k}$, if each $\mathcal{A}_{i}$ satisfies $\epsilon_{i}$-LDP, then their sequential execution on the same dataset satisfies $\sum_{i}\epsilon_{i}$-LDP.

Figures (4)

  • Figure 1: The worst-case noise variance vs. privacy budget for one-dimensional data.
  • Figure 2: Trade-offs between privacy and accuracy under different LDP mechanisms in node classification. Note that the error bars represent the standard deviation and the results for $\infty$ denote the accuracy of the non-private baselines.
  • Figure 3: Effect of the sampling parameter $k$ on the performance of PrivGE for node classification.
  • Figure 4: Effect of the sampling parameter $k$ on the performance of PrivGE for link prediction.

Theorems & Definitions (16)

  • Definition 2.1: $\epsilon$-Local Differential Privacy kasiviswanathan2011can
  • Proposition 2.2: Sequential Composition day2016publishing
  • Proposition 2.3: Post-processing day2016publishing
  • Theorem 3.1
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • ...and 6 more