Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

Hanyang Yuan; Jiarong Xu; Renhong Huang; Mingli Song; Chunping Wang; Yang Yang

Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

Hanyang Yuan, Jiarong Xu, Renhong Huang, Mingli Song, Chunping Wang, Yang Yang

TL;DR

This work studies graph property inference attack to identify the risk of sensitive property information leakage from shared models, and proposes a novel selection mechanism to ensure that the retained approximated models achieve high diversity and low error.

Abstract

Graph neural networks (GNNs) have attracted considerable attention due to their diverse applications. However, the scarcity and quality limitations of graph data present challenges to their training process in practical settings. To facilitate the development of effective GNNs, companies and researchers often seek external collaboration. Yet, directly sharing data raises privacy concerns, motivating data owners to train GNNs on their private graphs and share the trained models. Unfortunately, these models may still inadvertently disclose sensitive properties of their training graphs (e.g., average default rate in a transaction network), leading to severe consequences for data owners. In this work, we study graph property inference attack to identify the risk of sensitive property information leakage from shared models. Existing approaches typically train numerous shadow models for developing such attack, which is computationally intensive and impractical. To address this issue, we propose an efficient graph property inference attack by leveraging model approximation techniques. Our method only requires training a small set of models on graphs, while generating a sufficient number of approximated shadow models for attacks. To enhance diversity while reducing errors in the approximated models, we apply edit distance to quantify the diversity within a group of approximated models and introduce a theoretically guaranteed criterion to evaluate each model's error. Subsequently, we propose a novel selection mechanism to ensure that the retained approximated models achieve high diversity and low error. Extensive experiments across six real-world scenarios demonstrate our method's substantial improvement, with average increases of 2.7% in attack accuracy and 4.1% in ROC-AUC, while being 6.5$\times$ faster compared to the best baseline.

Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

TL;DR

Abstract

Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (4)