HoGS: Homophily-Oriented Graph Synthesis for Local Differentially Private GNN Training
Wen Xu, Zhetao Li, Yong Xiao, Pengpeng Qiao, Mianxiong Dong, Kaoru Ota
TL;DR
HoGS addresses privacy risks in decentralized GNN training by generating a synthetic graph under $ε$-LDP that protects both links and node features. It privately collects perturbed adjacency lists and features, then uses homophily to perform bidirectional reconstruction: topology is inferred by Bayesian estimation with cosine similarity of noisy features, and features are denoised via a weighted aggregation over probable neighbors. The method guarantees $ε$-LDP through a split budget ($ε_a$, $ε_f$) with $ε_a=(1-δ)ε$ and $ε_f=δε$, and treats topology/feature reconstruction as post-processing. Empirical results on Cora, CiteSeer, and LastFM show HoGS substantially outperforms baselines across GCN, GraphSAGE, and GAT, demonstrating robust privacy-utility tradeoffs and practical impact for privacy-preserving GNN training.
Abstract
Graph neural networks (GNNs) have demonstrated remarkable performance in various graph-based machine learning tasks by effectively modeling high-order interactions between nodes. However, training GNNs without protection may leak sensitive personal information in graph data, including links and node features. Local differential privacy (LDP) is an advanced technique for protecting data privacy in decentralized networks. Unfortunately, existing local differentially private GNNs either only preserve link privacy or suffer significant utility loss in the process of preserving link and node feature privacy. In this paper, we propose an effective LDP framework, called HoGS, which trains GNNs with link and feature protection by generating a synthetic graph. Concretely, HoGS first collects the link and feature information of the graph under LDP, and then utilizes the phenomenon of homophily in graph data to reconstruct the graph structure and node features separately, thereby effectively mitigating the negative impact of LDP on the downstream GNN training. We theoretically analyze the privacy guarantee of HoGS and conduct experiments using the generated synthetic graph as input to various state-of-the-art GNN architectures. Experimental results on three real-world datasets show that HoGS significantly outperforms baseline methods in the accuracy of training GNNs.
