Table of Contents
Fetching ...

Structure-Preference Enabled Graph Embedding Generation under Differential Privacy

Sen Zhang, Qingqing Ye, Haibo Hu

TL;DR

This paper addresses the privacy risks of publishing graph embeddings by introducing SE-PrivGEmb, a skip-gram–based method that supports user-defined structure preferences under node-level differential privacy. It introduces a noise-tolerance mechanism that perturbs only non-zero gradient components and designs negative sampling to preserve arbitrary node proximities, achieving strong utility under DP with node-level Rényi DP guarantees. The authors provide formal analysis showing structure-preference preservation and privacy composition, and validate the approach on six real-world networks, where SE-PrivGEmb outperforms state-of-the-art private methods in structural equivalence and link prediction. The work enables customizable, privacy-preserving graph embeddings with practical DP guarantees and broad applicability to privacy-conscious graph mining tasks.

Abstract

Graph embedding generation techniques aim to learn low-dimensional vectors for each node in a graph and have recently gained increasing research attention. Publishing low-dimensional node vectors enables various graph analysis tasks, such as structural equivalence and link prediction. Yet, improper publication opens a backdoor to malicious attackers, who can infer sensitive information of individuals from the low-dimensional node vectors. Existing methods tackle this issue by developing deep graph learning models with differential privacy (DP). However, they often suffer from large noise injections and cannot provide structural preferences consistent with mining objectives. Recently, skip-gram based graph embedding generation techniques are widely used due to their ability to extract customizable structures. Based on skip-gram, we present SE-PrivGEmb, a structure-preference enabled graph embedding generation under DP. For arbitrary structure preferences, we design a unified noise tolerance mechanism via perturbing non-zero vectors. This mechanism mitigates utility degradation caused by high sensitivity. By carefully designing negative sampling probabilities in skip-gram, we theoretically demonstrate that skip-gram can preserve arbitrary proximities, which quantify structural features in graphs. Extensive experiments show that our method outperforms existing state-of-the-art methods under structural equivalence and link prediction tasks.

Structure-Preference Enabled Graph Embedding Generation under Differential Privacy

TL;DR

This paper addresses the privacy risks of publishing graph embeddings by introducing SE-PrivGEmb, a skip-gram–based method that supports user-defined structure preferences under node-level differential privacy. It introduces a noise-tolerance mechanism that perturbs only non-zero gradient components and designs negative sampling to preserve arbitrary node proximities, achieving strong utility under DP with node-level Rényi DP guarantees. The authors provide formal analysis showing structure-preference preservation and privacy composition, and validate the approach on six real-world networks, where SE-PrivGEmb outperforms state-of-the-art private methods in structural equivalence and link prediction. The work enables customizable, privacy-preserving graph embeddings with practical DP guarantees and broad applicability to privacy-conscious graph mining tasks.

Abstract

Graph embedding generation techniques aim to learn low-dimensional vectors for each node in a graph and have recently gained increasing research attention. Publishing low-dimensional node vectors enables various graph analysis tasks, such as structural equivalence and link prediction. Yet, improper publication opens a backdoor to malicious attackers, who can infer sensitive information of individuals from the low-dimensional node vectors. Existing methods tackle this issue by developing deep graph learning models with differential privacy (DP). However, they often suffer from large noise injections and cannot provide structural preferences consistent with mining objectives. Recently, skip-gram based graph embedding generation techniques are widely used due to their ability to extract customizable structures. Based on skip-gram, we present SE-PrivGEmb, a structure-preference enabled graph embedding generation under DP. For arbitrary structure preferences, we design a unified noise tolerance mechanism via perturbing non-zero vectors. This mechanism mitigates utility degradation caused by high sensitivity. By carefully designing negative sampling probabilities in skip-gram, we theoretically demonstrate that skip-gram can preserve arbitrary proximities, which quantify structural features in graphs. Extensive experiments show that our method outperforms existing state-of-the-art methods under structural equivalence and link prediction tasks.
Paper Structure (28 sections, 5 theorems, 21 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 28 sections, 5 theorems, 21 equations, 4 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

If $\mathcal{A}$ is an $(\alpha, \epsilon)$-RDP algorithm, then it also satisfies $(\epsilon+\frac{\log (1 / \delta)}{\alpha-1}, \delta)$-DP for any $\delta \in(0,1)$.

Figures (4)

  • Figure 1: Architecture of a Skip-gram Model. The embedding matrices $\mathbf{W}_{in}$, with a size of $|V|\times{r}$, and $\mathbf{W}_{out}^\top$, with a size of $r\times|V|$, are two model parameters that require optimization. For any node pair $(v_i, v_j)$, $\mathbf{v}_i$ represents the vector for $v_i$ in the input weight matrix $\mathbf{W}_{in}$, while $\mathbf{v}_j$ represents the vector for $v_j$ in the output weight matrix $\mathbf{W}_{out}$.
  • Figure 2: An Illustration of Private Update for $\mathbf{W}_{in}$. (a) represents the original input weight, (b) represents the gradient of the input weight, while (c) and (d) depict the perturbed input weight using Eq. (\ref{['eq:novNoiseGra_on_vi']}) and Eq. (\ref{['eq:nonZeroNoiseGra_on_vi']}), respectively.
  • Figure 3: Impact of Privacy Budget on Structural Equivalence
  • Figure 4: Impact of Privacy Budget on Link Prediction

Theorems & Definitions (13)

  • Definition 1: Edge (Node)-Level DP hay2009accurate
  • Definition 2: RDP mironov2017renyi
  • Theorem 1: RDP conversion to ($\epsilon, \delta)$-DP mironov2017renyi
  • Definition 3: Sensitivity dwork2006calibrating
  • Definition 4: Node Proximity Matrix
  • Definition 5: Private Graph Embedding Generation under Bounded DP
  • Theorem 2
  • Theorem 3
  • proof
  • Definition 6: Subsample wang2019subsampled
  • ...and 3 more