Table of Contents
Fetching ...

Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data

Hanyang Yuan, Jiarong Xu, Cong Wang, Ziqi Yang, Chunping Wang, Keting Yin, Yang Yang

TL;DR

The paper addresses privacy risks from graph structure (GPS) beyond direct attribute leakage by introducing the Generalized Homophily Ratio to quantify structure-based privacy leakage and by developing a data-centric private attribute inference attack that combines proximity and structure-role information. A correspondingly learnable graph sampling method is proposed to publish privacy-preserving graphs, guided by a minimax objective against worst-case attacks and a broader GHRatio-based defense. Extensive experiments on Pokec-n, Pokec-z, and NBA show the attack outperforms baselines and the defense achieves superior privacy-utility trade-offs while preserving essential graph properties. This work advances privacy-preserving graph publishing by integrating structure-aware leakage measurement with adaptive edge sampling to mitigate risk in real-world networks.

Abstract

The public sharing of user information opens the door for adversaries to infer private data, leading to privacy breaches and facilitating malicious activities. While numerous studies have concentrated on privacy leakage via public user attributes, the threats associated with the exposure of user relationships, particularly through network structure, are often neglected. This study aims to fill this critical gap by advancing the understanding and protection against privacy risks emanating from network structure, moving beyond direct connections with neighbors to include the broader implications of indirect network structural patterns. To achieve this, we first investigate the problem of Graph Privacy Leakage via Structure (GPS), and introduce a novel measure, the Generalized Homophily Ratio, to quantify the various mechanisms contributing to privacy breach risks in GPS. Based on this insight, we develop a novel graph private attribute inference attack, which acts as a pivotal tool for evaluating the potential for privacy leakage through network structures under worst-case scenarios. To protect users' private data from such vulnerabilities, we propose a graph data publishing method incorporating a learnable graph sampling technique, effectively transforming the original graph into a privacy-preserving version. Extensive experiments demonstrate that our attack model poses a significant threat to user privacy, and our graph data publishing method successfully achieves the optimal privacy-utility trade-off compared to baselines.

Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data

TL;DR

The paper addresses privacy risks from graph structure (GPS) beyond direct attribute leakage by introducing the Generalized Homophily Ratio to quantify structure-based privacy leakage and by developing a data-centric private attribute inference attack that combines proximity and structure-role information. A correspondingly learnable graph sampling method is proposed to publish privacy-preserving graphs, guided by a minimax objective against worst-case attacks and a broader GHRatio-based defense. Extensive experiments on Pokec-n, Pokec-z, and NBA show the attack outperforms baselines and the defense achieves superior privacy-utility trade-offs while preserving essential graph properties. This work advances privacy-preserving graph publishing by integrating structure-aware leakage measurement with adaptive edge sampling to mitigate risk in real-world networks.

Abstract

The public sharing of user information opens the door for adversaries to infer private data, leading to privacy breaches and facilitating malicious activities. While numerous studies have concentrated on privacy leakage via public user attributes, the threats associated with the exposure of user relationships, particularly through network structure, are often neglected. This study aims to fill this critical gap by advancing the understanding and protection against privacy risks emanating from network structure, moving beyond direct connections with neighbors to include the broader implications of indirect network structural patterns. To achieve this, we first investigate the problem of Graph Privacy Leakage via Structure (GPS), and introduce a novel measure, the Generalized Homophily Ratio, to quantify the various mechanisms contributing to privacy breach risks in GPS. Based on this insight, we develop a novel graph private attribute inference attack, which acts as a pivotal tool for evaluating the potential for privacy leakage through network structures under worst-case scenarios. To protect users' private data from such vulnerabilities, we propose a graph data publishing method incorporating a learnable graph sampling technique, effectively transforming the original graph into a privacy-preserving version. Extensive experiments demonstrate that our attack model poses a significant threat to user privacy, and our graph data publishing method successfully achieves the optimal privacy-utility trade-off compared to baselines.
Paper Structure (22 sections, 2 theorems, 19 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 22 sections, 2 theorems, 19 equations, 6 figures, 11 tables, 1 algorithm.

Key Result

Theorem 1

Let $S_{i}$ and $S_{j}$ be two k-hop subgraphs induced from node $v_i$ and $v_j$. After employing a $K$-layer GNN encoder with a 1-hop graph filter $\Psi(\mathcal{L})$ on each subgraph, the representations of the center node $v_i$ and $v_j$ are obtained via a pooling function, i.e., $H^\mathrm{role} where $\| \cdot \|_2$ denotes $L_2$ norm of matrix or vector, $\tau$ denotes a constant depending o

Figures (6)

  • Figure 1: (a) Illustration of privacy leakage mechanisms: proximity homophily highlighted in pink, structure-role homophily in blue, alongside the privacy protection strategy depicted in orange. (b) The results of private attribute inference attacks accounting for proximity homophily, structure-role homophily, and a combination of both on Pokec-n.
  • Figure 2: Visualization of proximity-related fraction and structure-related fraction distributions on NBA and Pokec-n.
  • Figure 3: Illustration of our data-centric strategy of feeding different data forms (i.e., graph vs subgraphs) into GNN to learn different knowledge (i.e., proximity homophily vs structure-role homophily).
  • Figure 4: An overview of (a) the proposed attribute inference attack and (b) the proposed graph data publishing method.
  • Figure 5: Privacy-utility trade-off of our defensive model and baselines. The upper-left corner represents the ideal performance.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1: Homophily indicator
  • Definition 2: Generalized Homophily Ratio (GHRatio)
  • Theorem 1
  • Lemma 1