GENIE: Watermarking Graph Neural Networks for Link Prediction

Venkata Sai Pranav Bachina; Ankit Gangwal; Aaryan Ajay Sharma; Charu Sharma

GENIE: Watermarking Graph Neural Networks for Link Prediction

Venkata Sai Pranav Bachina, Ankit Gangwal, Aaryan Ajay Sharma, Charu Sharma

TL;DR

This work tackles IP protection for graph neural networks applied to link prediction by introducing GENIE, a dynamic, backdoor-based watermarking framework. GENIE generates watermark data and embeds it into either node-representation or subgraph-based LP models, pairing it with Dynamic Watermark Thresholding (DWT) to deliver statistically confident ownership verification. Extensive experiments across seven real-world datasets and four GNN architectures show GENIE preserves model utility while enabling robust ownership verification, and it remains resilient to a broad suite of watermark-removal and model-extraction attacks. The approach also addresses ownership piracy and adaptive attackers, making GENIE practical for deploying watermark-protected GNN LP models in MLaaS environments.

Abstract

Graph Neural Networks (GNNs) have become invaluable intellectual property in graph-based machine learning. However, their vulnerability to model stealing attacks when deployed within Machine Learning as a Service (MLaaS) necessitates robust Ownership Demonstration (OD) techniques. Watermarking is a promising OD framework for Deep Neural Networks, but existing methods fail to generalize to GNNs due to the non-Euclidean nature of graph data. Previous works on GNN watermarking have primarily focused on node and graph classification, overlooking Link Prediction (LP). In this paper, we propose GENIE (watermarking Graph nEural Networks for lInk prEdiction), the first-ever scheme to watermark GNNs for LP. GENIE creates a novel backdoor for both node-representation and subgraph-based LP methods, utilizing a unique trigger set and a secret watermark vector. Our OD scheme is equipped with Dynamic Watermark Thresholding (DWT), ensuring high verification probability (>99.99%) while addressing practical issues in existing watermarking schemes. We extensively evaluate GENIE across 4 model architectures (i.e., SEAL, GCN, GraphSAGE and NeoGNN) and 7 real-world datasets. Furthermore, we validate the robustness of GENIE against 11 state-of-the-art watermark removal techniques and 3 model extraction attacks. We also show GENIE's resilience against ownership piracy attacks. Finally, we discuss a defense strategy to counter adaptive attacks against GENIE.

GENIE: Watermarking Graph Neural Networks for Link Prediction

TL;DR

Abstract

Paper Structure (41 sections, 16 equations, 6 figures, 38 tables)

This paper contains 41 sections, 16 equations, 6 figures, 38 tables.

Introduction
Background
Graph Neural Networks
Link Prediction (LP)
Backdoor attacks and watermarking
Related works
Threat model
Genie
Watermark data generation
Genie for node representation-based method
Genie for subgraph-based method
Watermark embedding
Watermark verification
Non-trivial ownership
Dynamic Watermark Thresholding (DWT)
...and 26 more sections

Figures (6)

Figure 1: The predictions by a watermarked GNN on a graph without watermark (i.e., Link doesn't exist) should be opposite to that on a graph injected with watermark (i.e., Link exists).
Figure 2: A representative illustration of watermark graph $\mathcal{G}_{wm}$ generation from the original graph $\mathcal{G}$ for node representation-based watermark data generation.
Figure 3: A depiction of generating $\mathcal{D}_{wm}$ for subgraph-based methods. Here, the original subgrapph $\mathcal{G}_i$ is created from an arbitrary pair of nodes $(u, v)$ with label $y_i \in \{0,~1\}$. In the modified subgraph, the original feature vectors $\mathbf{x_v}$ are replaced with the watermark vector $\mathbf{w}$ and the subgraph label $\overline{y}_i$.
Figure 4: KDEs of $AUC_{\mathcal{D}_{wm}}^{\mathcal{M}_{clean}}$ and $AUC_{\mathcal{D}_{wm}}^{\mathcal{M}_{wm}}$ for dataset-architecture pair Yeast-SEAL, along with the corresponding watermark threshold $t$ with $n=10^6$. $\alpha$ and $\beta$ here denote the FPR and FNR (not visible since they are $0$), while $\mu_{c}$ and $\mu_{w}$ denote the population mean of $AUC_{\mathcal{D}_{wm}}^{\mathcal{M}_{clean}}$ and $AUC_{\mathcal{D}_{wm}}^{\mathcal{M}_{wm}}$.
Figure D.1: A representative example of $\mathcal{M}_{adv}$'s performance trajectory on $\mathcal{D}_{wm}$, $\mathcal{D}_{test}$, and pirated trigger set during embedding of pirated watermark across training epochs.
...and 1 more figures

GENIE: Watermarking Graph Neural Networks for Link Prediction

TL;DR

Abstract

GENIE: Watermarking Graph Neural Networks for Link Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (6)