GraphPub: Generation of Differential Privacy Graph with High Availability
Wanghan Xu, Bin Shi, Ao Liu, Jiqiang Zhang, Bo Dong
TL;DR
GraphPub tackles edge privacy in published graphs for GNN tasks by enabling differential privacy while preserving data usability. It combines reverse learning with an encoder–decoder to identify edges whose perturbation minimally harms aggregation, then samples real and false edges under a privacy budget $ε$ to form the published graph $ ilde{A}$. Degree preservation is incorporated via a split budget $ε_1$ and $ε_2$ with a Laplacian mechanism, improving topology fidelity and sparsity. Experimental results across multiple datasets and GNN architectures show close-to-original accuracy even at low $ε$, along with robustness to edge-privacy attacks and scalable overhead. This approach offers a practical path to privacy-preserving yet usable graph data for GNN research and applications.
Abstract
In recent years, with the rapid development of graph neural networks (GNN), more and more graph datasets have been published for GNN tasks. However, when an upstream data owner publishes graph data, there are often many privacy concerns, because many real-world graph data contain sensitive information like person's friend list. Differential privacy (DP) is a common method to protect privacy, but due to the complex topological structure of graph data, applying DP on graphs often affects the message passing and aggregation of GNN models, leading to a decrease in model accuracy. In this paper, we propose a novel graph edge protection framework, graph publisher (GraphPub), which can protect graph topology while ensuring that the availability of data is basically unchanged. Through reverse learning and the encoder-decoder mechanism, we search for some false edges that do not have a large negative impact on the aggregation of node features, and use them to replace some real edges. The modified graph will be published, which is difficult to distinguish between real and false data. Sufficient experiments prove that our framework achieves model accuracy close to the original graph with an extremely low privacy budget.
