Table of Contents
Fetching ...

Local Differential Privacy in Graph Neural Networks: a Reconstruction Approach

Karuna Bhaila, Wen Huang, Yongkai Wu, Xintao Wu

TL;DR

The paper tackles node-level privacy in Graph Neural Networks by enforcing Local Differential Privacy (LDP) at the data source, aiming to train effective GNNs without exposing private node features or labels. It introduces RGNN, a reconstruction-based framework that combines Generalized Randomized Response with Feature Sampling (GRR-FS), frequency-estimation-based reconstruction, multi-hop propagation, and learning from label proportions (LLP) to supervise training on clusters. Key contributions include deriving GRR-FS’s privacy budget and error, reconstructing node features and labels from perturbed data, and integrating LLP as an additional supervisory signal, all while achieving $(\epsilon_X+\epsilon_y)$-LDP. Empirically, RGNN shows superior privacy-utility trade-offs compared with baselines like LPGNN across four real-world and two semi-synthetic datasets, using flexible GNN backbones and varying privacy budgets, which demonstrates the practical viability of private GNN training in decentralized settings.

Abstract

Graph Neural Networks have achieved tremendous success in modeling complex graph data in a variety of applications. However, there are limited studies investigating privacy protection in GNNs. In this work, we propose a learning framework that can provide node privacy at the user level, while incurring low utility loss. We focus on a decentralized notion of Differential Privacy, namely Local Differential Privacy, and apply randomization mechanisms to perturb both feature and label data at the node level before the data is collected by a central server for model training. Specifically, we investigate the application of randomization mechanisms in high-dimensional feature settings and propose an LDP protocol with strict privacy guarantees. Based on frequency estimation in statistical analysis of randomized data, we develop reconstruction methods to approximate features and labels from perturbed data. We also formulate this learning framework to utilize frequency estimates of graph clusters to supervise the training procedure at a sub-graph level. Extensive experiments on real-world and semi-synthetic datasets demonstrate the validity of our proposed model.

Local Differential Privacy in Graph Neural Networks: a Reconstruction Approach

TL;DR

The paper tackles node-level privacy in Graph Neural Networks by enforcing Local Differential Privacy (LDP) at the data source, aiming to train effective GNNs without exposing private node features or labels. It introduces RGNN, a reconstruction-based framework that combines Generalized Randomized Response with Feature Sampling (GRR-FS), frequency-estimation-based reconstruction, multi-hop propagation, and learning from label proportions (LLP) to supervise training on clusters. Key contributions include deriving GRR-FS’s privacy budget and error, reconstructing node features and labels from perturbed data, and integrating LLP as an additional supervisory signal, all while achieving $(\epsilon_X+\epsilon_y)$-LDP. Empirically, RGNN shows superior privacy-utility trade-offs compared with baselines like LPGNN across four real-world and two semi-synthetic datasets, using flexible GNN backbones and varying privacy budgets, which demonstrates the practical viability of private GNN training in decentralized settings.

Abstract

Graph Neural Networks have achieved tremendous success in modeling complex graph data in a variety of applications. However, there are limited studies investigating privacy protection in GNNs. In this work, we propose a learning framework that can provide node privacy at the user level, while incurring low utility loss. We focus on a decentralized notion of Differential Privacy, namely Local Differential Privacy, and apply randomization mechanisms to perturb both feature and label data at the node level before the data is collected by a central server for model training. Specifically, we investigate the application of randomization mechanisms in high-dimensional feature settings and propose an LDP protocol with strict privacy guarantees. Based on frequency estimation in statistical analysis of randomized data, we develop reconstruction methods to approximate features and labels from perturbed data. We also formulate this learning framework to utilize frequency estimates of graph clusters to supervise the training procedure at a sub-graph level. Extensive experiments on real-world and semi-synthetic datasets demonstrate the validity of our proposed model.
Paper Structure (23 sections, 4 theorems, 12 equations, 4 figures, 2 tables, 3 algorithms)

This paper contains 23 sections, 4 theorems, 12 equations, 4 figures, 2 tables, 3 algorithms.

Key Result

Lemma 1

Amplification Effect of Sampling DBLP:conf/ccs/LiQS12. Let $\mathcal{M}$ denote an algorithm that guarantees $\epsilon'$-DP over some data. Also, let $\mathcal{M}^{\beta}$ denote an algorithm that first samples tuples from the data with probability $\beta$ and then applies $\mathcal{M}$ on the sampl

Figures (4)

  • Figure 1: An overview of the proposed framework RGNN
  • Figure 2: Comparison of GNN architectures
  • Figure 3: Hyperparameter study of $\alpha$ and $C$
  • Figure 4: Influence of propagation parameters $K_x$ and $K_y$ on the performance of RGNN

Theorems & Definitions (8)

  • Definition 1
  • Lemma 1
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof