Table of Contents
Fetching ...

Large Language Models for Link Stealing Attacks Against Graph Neural Networks

Faqian Guan, Tianqing Zhu, Hui Sun, Wanlei Zhou, Philip S. Yu

TL;DR

This work demonstrates that Large Language Models can significantly enhance link stealing attacks against Graph Neural Networks by effectively combining node textual features with posterior probabilities. By designing task-specific prompts and fine-tuning a single LLM across multiple datasets, the approach achieves state-of-the-art white-box and competitive black-box performance, showcasing strong cross-dataset generalization and transferability across LLM and GNN variants. The theoretical analysis and extensive experiments reveal that incorporating textual information yields measurable gains and that a single LLM can operate across diverse datasets, improving realism and practicality of privacy attacks on graphs. The results highlight important privacy risks in GNN deployments and suggest a need for robust defenses against text-informed, cross-dataset privacy breaches in graph-structured data.

Abstract

Graph data contains rich node features and unique edge information, which have been applied across various domains, such as citation networks or recommendation systems. Graph Neural Networks (GNNs) are specialized for handling such data and have shown impressive performance in many applications. However, GNNs may contain of sensitive information and susceptible to privacy attacks. For example, link stealing is a type of attack in which attackers infer whether two nodes are linked or not. Previous link stealing attacks primarily relied on posterior probabilities from the target GNN model, neglecting the significance of node features. Additionally, variations in node classes across different datasets lead to different dimensions of posterior probabilities. The handling of these varying data dimensions posed a challenge in using a single model to effectively conduct link stealing attacks on different datasets. To address these challenges, we introduce Large Language Models (LLMs) to perform link stealing attacks on GNNs. LLMs can effectively integrate textual features and exhibit strong generalizability, enabling attacks to handle diverse data dimensions across various datasets. We design two distinct LLM prompts to effectively combine textual features and posterior probabilities of graph nodes. Through these designed prompts, we fine-tune the LLM to adapt to the link stealing attack task. Furthermore, we fine-tune the LLM using multiple datasets and enable the LLM to learn features from different datasets simultaneously. Experimental results show that our approach significantly enhances the performance of existing link stealing attack tasks in both white-box and black-box scenarios. Our method can execute link stealing attacks across different datasets using only a single model, making link stealing attacks more applicable to real-world scenarios.

Large Language Models for Link Stealing Attacks Against Graph Neural Networks

TL;DR

This work demonstrates that Large Language Models can significantly enhance link stealing attacks against Graph Neural Networks by effectively combining node textual features with posterior probabilities. By designing task-specific prompts and fine-tuning a single LLM across multiple datasets, the approach achieves state-of-the-art white-box and competitive black-box performance, showcasing strong cross-dataset generalization and transferability across LLM and GNN variants. The theoretical analysis and extensive experiments reveal that incorporating textual information yields measurable gains and that a single LLM can operate across diverse datasets, improving realism and practicality of privacy attacks on graphs. The results highlight important privacy risks in GNN deployments and suggest a need for robust defenses against text-informed, cross-dataset privacy breaches in graph-structured data.

Abstract

Graph data contains rich node features and unique edge information, which have been applied across various domains, such as citation networks or recommendation systems. Graph Neural Networks (GNNs) are specialized for handling such data and have shown impressive performance in many applications. However, GNNs may contain of sensitive information and susceptible to privacy attacks. For example, link stealing is a type of attack in which attackers infer whether two nodes are linked or not. Previous link stealing attacks primarily relied on posterior probabilities from the target GNN model, neglecting the significance of node features. Additionally, variations in node classes across different datasets lead to different dimensions of posterior probabilities. The handling of these varying data dimensions posed a challenge in using a single model to effectively conduct link stealing attacks on different datasets. To address these challenges, we introduce Large Language Models (LLMs) to perform link stealing attacks on GNNs. LLMs can effectively integrate textual features and exhibit strong generalizability, enabling attacks to handle diverse data dimensions across various datasets. We design two distinct LLM prompts to effectively combine textual features and posterior probabilities of graph nodes. Through these designed prompts, we fine-tune the LLM to adapt to the link stealing attack task. Furthermore, we fine-tune the LLM using multiple datasets and enable the LLM to learn features from different datasets simultaneously. Experimental results show that our approach significantly enhances the performance of existing link stealing attack tasks in both white-box and black-box scenarios. Our method can execute link stealing attacks across different datasets using only a single model, making link stealing attacks more applicable to real-world scenarios.

Paper Structure

This paper contains 38 sections, 5 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: Real-world instance of link stealing attacks in Graph Neural Networks.
  • Figure 2: Basic flow of link stealing attacks. The service provider trains the model and deploys it on the web for queries. The attacker queries the model using existing knowledge to obtain posterior probabilities. The attacker then conducts an attack using the obtained posterior probabilities and the original knowledge.
  • Figure 3: Overview of proposed link stealing attack method. The Data Processing step involves creating node pairs containing features and posterior probabilities of the nodes. These pairs serve as input features for constructing the LLM attack model. In the Prompt Design step, different prompts are created for white-box and black-box settings. These prompts include information about the node pairs. In the Fine-tuning step, the LLM is fine-tuned using the designed prompts. In the Link Stealing step, the fine-tuned LLM is used to determine whether there is a connection between the node pairs.
  • Figure 4: Response to link stealing attacks by the original LLM model in the black-box setting.
  • Figure 5: Our prompt designs for link stealing attacks.
  • ...and 7 more figures