Table of Contents
Fetching ...

On Stealing Graph Neural Network Models

Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pręgowska, Tomasz P. Michalak

TL;DR

This work demonstrates that graph neural network model stealing remains feasible even under severe query limits. The authors propose a two-stage attack: first obtain an encoder backbone locally (randomly initialized in the inductive setting or SSL-trained in the transductive setting), then use a fixed budget of queries to train a surrogate head via intelligent query selection, forming a high-fidelity replica $f_s$ of the victim $f_v$. Across eight real-world datasets, the approach achieves strong accuracy and fidelity with as few as $q_n$ queries, outperforming prior methods in both inductive and transductive scenarios. Hard-label defenses prove insufficient, highlighting significant security risks for GNN deployment and the need for robust countermeasures.

Abstract

Current graph neural network (GNN) model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits. However, in reality, the number of allowed queries can be severely limited. In this paper, we demonstrate how an adversary can extract a GNN with very limited interactions with the model. Our approach first enables the adversary to obtain the model backbone without making direct queries to the victim model and then to strategically utilize a fixed query limit to extract the most informative data. The experiments on eight real-world datasets demonstrate the effectiveness of the attack, even under a very restricted query limit and under defense against model extraction in place. Our findings underscore the need for robust defenses against GNN model extraction threats.

On Stealing Graph Neural Network Models

TL;DR

This work demonstrates that graph neural network model stealing remains feasible even under severe query limits. The authors propose a two-stage attack: first obtain an encoder backbone locally (randomly initialized in the inductive setting or SSL-trained in the transductive setting), then use a fixed budget of queries to train a surrogate head via intelligent query selection, forming a high-fidelity replica of the victim . Across eight real-world datasets, the approach achieves strong accuracy and fidelity with as few as queries, outperforming prior methods in both inductive and transductive scenarios. Hard-label defenses prove insufficient, highlighting significant security risks for GNN deployment and the need for robust countermeasures.

Abstract

Current graph neural network (GNN) model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits. However, in reality, the number of allowed queries can be severely limited. In this paper, we demonstrate how an adversary can extract a GNN with very limited interactions with the model. Our approach first enables the adversary to obtain the model backbone without making direct queries to the victim model and then to strategically utilize a fixed query limit to extract the most informative data. The experiments on eight real-world datasets demonstrate the effectiveness of the attack, even under a very restricted query limit and under defense against model extraction in place. Our findings underscore the need for robust defenses against GNN model extraction threats.

Paper Structure

This paper contains 20 sections, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Standard GNN model stealing vs. our approach. Conventional model stealing methods extract the entire GNN through extensive querying of the victim model API. Our method divides this process into stages, focusing on maximizing the stealing outcome within a restricted query limit. First, we show that the adversary can obtain the encoder backbone locally, without any interaction with the victim API. Then, the adversary performs query selection using the representations from the extracted encoder and extracts the network head via selective querying. This enables effective model stealing under strict query budgets, demonstrating that the GNN model stealing threat is significantly more severe than previously assumed.
  • Figure 2:
  • Figure 3: T-SNE projections of embeddings on the Citeseer dataset (transductive setting) and the Physics dataset (inductive setting).
  • Figure 4: Accuracy and fidelity (inductive setting, target: SAGE, surrogate: GCN) for $q_n \in \{10, 25, 50, 100, 500\}$ on Physics, Photo, and Reddit. Methods marked with * assume access to victim embeddings (weaker threat model).
  • Figure 5: Accuracy and fidelity (inductive setting, target: SAGE, surrogate: GCN) for $q_n \in \{10, 25, 50, 100, 500\}$ on WikiCS, and CS. Methods marked with * assume access to victim embeddings (weaker threat model).
  • ...and 3 more figures