Table of Contents
Fetching ...

AdvSGM: Differentially Private Graph Learning via Adversarial Skip-gram Model

Sen Zhang, Qingqing Ye, Haibo Hu, Jianliang Xu

TL;DR

Adv SG M addresses the privacy risks of graph skip-gram embeddings by introducing a differential privacy framework for graphs that leverages adversarial training. The core idea is to privatize the skip-gram via two optimizable noise terms embedded in the adversarial module and to achieve gradient perturbation by carefully tuning the relative weights between the skip-gram and adversarial components, ensuring node-level $(\epsilon,\delta)$-DP through post-processing. The authors characterize privacy with Rényi differential privacy and subsampling amplification, derive a practical training algorithm with complexity that scales linearly with batch sizes, and prove DP guarantees for the discriminator which transfer to the generator. Empirically, AdvSGM outperforms state-of-the-art private graph embeddings on link prediction and node clustering across six real-world datasets, especially at moderate privacy budgets, demonstrating a favorable privacy-utility trade-off that enables private graph representations for downstream tasks.

Abstract

The skip-gram model (SGM), which employs a neural network to generate node vectors, serves as the basis for numerous popular graph embedding techniques. However, since the training datasets contain sensitive linkage information, the parameters of a released SGM may encode private information and pose significant privacy risks. Differential privacy (DP) is a rigorous standard for protecting individual privacy in data analysis. Nevertheless, when applying differential privacy to skip-gram in graphs, it becomes highly challenging due to the complex link relationships, which potentially result in high sensitivity and necessitate substantial noise injection. To tackle this challenge, we present AdvSGM, a differentially private skip-gram for graphs via adversarial training. Our core idea is to leverage adversarial training to privatize skip-gram while improving its utility. Towards this end, we develop a novel adversarial training module by devising two optimizable noise terms that correspond to the parameters of a skip-gram. By fine-tuning the weights between modules within AdvSGM, we can achieve differentially private gradient updates without additional noise injection. Extensive experimental results on six real-world graph datasets show that AdvSGM preserves high data utility across different downstream tasks.

AdvSGM: Differentially Private Graph Learning via Adversarial Skip-gram Model

TL;DR

Adv SG M addresses the privacy risks of graph skip-gram embeddings by introducing a differential privacy framework for graphs that leverages adversarial training. The core idea is to privatize the skip-gram via two optimizable noise terms embedded in the adversarial module and to achieve gradient perturbation by carefully tuning the relative weights between the skip-gram and adversarial components, ensuring node-level -DP through post-processing. The authors characterize privacy with Rényi differential privacy and subsampling amplification, derive a practical training algorithm with complexity that scales linearly with batch sizes, and prove DP guarantees for the discriminator which transfer to the generator. Empirically, AdvSGM outperforms state-of-the-art private graph embeddings on link prediction and node clustering across six real-world datasets, especially at moderate privacy budgets, demonstrating a favorable privacy-utility trade-off that enables private graph representations for downstream tasks.

Abstract

The skip-gram model (SGM), which employs a neural network to generate node vectors, serves as the basis for numerous popular graph embedding techniques. However, since the training datasets contain sensitive linkage information, the parameters of a released SGM may encode private information and pose significant privacy risks. Differential privacy (DP) is a rigorous standard for protecting individual privacy in data analysis. Nevertheless, when applying differential privacy to skip-gram in graphs, it becomes highly challenging due to the complex link relationships, which potentially result in high sensitivity and necessitate substantial noise injection. To tackle this challenge, we present AdvSGM, a differentially private skip-gram for graphs via adversarial training. Our core idea is to leverage adversarial training to privatize skip-gram while improving its utility. Towards this end, we develop a novel adversarial training module by devising two optimizable noise terms that correspond to the parameters of a skip-gram. By fine-tuning the weights between modules within AdvSGM, we can achieve differentially private gradient updates without additional noise injection. Extensive experimental results on six real-world graph datasets show that AdvSGM preserves high data utility across different downstream tasks.

Paper Structure

This paper contains 28 sections, 7 theorems, 26 equations, 4 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

Let $f(\mathcal{G})$ be $\left(\epsilon_1, \delta_1\right)$-DP and $g(\mathcal{G})$ be $\left(\epsilon_2, \delta_2\right)$-DP, then the mechanism $F(\mathcal{G}) = (f(\mathcal{G}), g(\mathcal{G}))$ which releases both results satisfies $\left(\epsilon_1+\epsilon_2, \delta_1+\delta_2\right)$-DP.

Figures (4)

  • Figure 1: Architecture of AdvSGM. The discriminator can be divided into two modules: skip-gram (graph structure preservation module) for learning the features of the input data, and adversarial training module for improving the performance of skip-gram. Two generators are employed to generate fake neighbors for the real node pair $(v_i,v_j)$. These fake node pairs are designed to deceive the discriminator with a high probability, while the discriminator is trained to distinguish between real and fake node pairs.
  • Figure 2: Effect of weight settings across different datasets.
  • Figure 3: Impact of Privacy Budget on Link Prediction.
  • Figure 4: Impact of Privacy Budget on Node Clustering.

Theorems & Definitions (15)

  • Remark 1
  • Remark 2
  • Definition 1: Edge (Node)-Level DP hay2009accurate
  • Theorem 1: Sequential Composition dwork2014algorithmic
  • Theorem 2: Post-Processing dwork2014algorithmic
  • Definition 2: RDP mironov2017renyi
  • Theorem 3: From RDP to ($\epsilon, \delta)$-DP mironov2017renyi
  • Theorem 4: RDP for Subsampled Mechanisms wang2019subsampled
  • Definition 3: Adversarial Skip-gram under Bounded DP
  • Theorem 5
  • ...and 5 more