Table of Contents
Fetching ...

Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

Hanyang Yuan, Jiarong Xu, Renhong Huang, Mingli Song, Chunping Wang, Yang Yang

TL;DR

This work studies graph property inference attack to identify the risk of sensitive property information leakage from shared models, and proposes a novel selection mechanism to ensure that the retained approximated models achieve high diversity and low error.

Abstract

Graph neural networks (GNNs) have attracted considerable attention due to their diverse applications. However, the scarcity and quality limitations of graph data present challenges to their training process in practical settings. To facilitate the development of effective GNNs, companies and researchers often seek external collaboration. Yet, directly sharing data raises privacy concerns, motivating data owners to train GNNs on their private graphs and share the trained models. Unfortunately, these models may still inadvertently disclose sensitive properties of their training graphs (e.g., average default rate in a transaction network), leading to severe consequences for data owners. In this work, we study graph property inference attack to identify the risk of sensitive property information leakage from shared models. Existing approaches typically train numerous shadow models for developing such attack, which is computationally intensive and impractical. To address this issue, we propose an efficient graph property inference attack by leveraging model approximation techniques. Our method only requires training a small set of models on graphs, while generating a sufficient number of approximated shadow models for attacks. To enhance diversity while reducing errors in the approximated models, we apply edit distance to quantify the diversity within a group of approximated models and introduce a theoretically guaranteed criterion to evaluate each model's error. Subsequently, we propose a novel selection mechanism to ensure that the retained approximated models achieve high diversity and low error. Extensive experiments across six real-world scenarios demonstrate our method's substantial improvement, with average increases of 2.7% in attack accuracy and 4.1% in ROC-AUC, while being 6.5$\times$ faster compared to the best baseline.

Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

TL;DR

This work studies graph property inference attack to identify the risk of sensitive property information leakage from shared models, and proposes a novel selection mechanism to ensure that the retained approximated models achieve high diversity and low error.

Abstract

Graph neural networks (GNNs) have attracted considerable attention due to their diverse applications. However, the scarcity and quality limitations of graph data present challenges to their training process in practical settings. To facilitate the development of effective GNNs, companies and researchers often seek external collaboration. Yet, directly sharing data raises privacy concerns, motivating data owners to train GNNs on their private graphs and share the trained models. Unfortunately, these models may still inadvertently disclose sensitive properties of their training graphs (e.g., average default rate in a transaction network), leading to severe consequences for data owners. In this work, we study graph property inference attack to identify the risk of sensitive property information leakage from shared models. Existing approaches typically train numerous shadow models for developing such attack, which is computationally intensive and impractical. To address this issue, we propose an efficient graph property inference attack by leveraging model approximation techniques. Our method only requires training a small set of models on graphs, while generating a sufficient number of approximated shadow models for attacks. To enhance diversity while reducing errors in the approximated models, we apply edit distance to quantify the diversity within a group of approximated models and introduce a theoretically guaranteed criterion to evaluate each model's error. Subsequently, we propose a novel selection mechanism to ensure that the retained approximated models achieve high diversity and low error. Extensive experiments across six real-world scenarios demonstrate our method's substantial improvement, with average increases of 2.7% in attack accuracy and 4.1% in ROC-AUC, while being 6.5 faster compared to the best baseline.

Paper Structure

This paper contains 56 sections, 2 theorems, 20 equations, 3 figures, 8 tables, 1 algorithm.

Key Result

Theorem 3.1

Given the GNN parameter $\theta^\mathrm{ref}$ on ${G}^\mathrm{ref}$, the removed nodes ${V}^\mathrm{R}$, removed edges ${E}^\mathrm{R}$ and influenced nodes ${V}^\mathrm{I}$. Assume $\ell$ is twice-differentiable everywhere and convex, we have where $\nabla$ denote gradient, and $\nabla^2$ denote Hessian. $\mathcal{L}(\theta^\mathrm{ref}; G^\mathrm{aug})=\sum_{v \in V/V^\mathrm{R}} \ell(\theta^\m

Figures (3)

  • Figure 1: Illustrations of (a) conventional graph property inference attacks and (b) the proposed attack, with yellow shading indicating model training, the main source of computational cost.
  • Figure 2: (a) Evaluation of the necessity of considering diversity while minimizing the approximation error. (b) and (c) Impact of the number of augmented graphs (per reference graph) and reference graphs on attack accuracy, respectively. (d) Accuracy and runtime comparison in black-box settings.
  • Figure 3: Comparison of average attack accuracy and runtime (seconds) on: (a)-(c) other GNNs, including GAT, GCN, and SGC; (d) a large-scale dataset, Pokec-100M.

Theorems & Definitions (4)

  • Definition 1: Influenced nodes
  • Theorem 3.1: GNN model approximation
  • Theorem 3.2: Approximation error bound
  • Definition 2: Diversity for $\mathcal{G}^\mathrm{aug}$