Table of Contents
Fetching ...

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

Bolin Shen, Md Shamim Seraj, Zhan Cheng, Shayok Chakraborty, Yushun Dong

TL;DR

A novel ownership verification framework CITED is proposed which is a first-of-its-kind method to achieve ownership verification on both embedding and label levels and is a novel signature-based method that neither harms downstream performance nor introduces auxiliary models that reduce efficiency, while still outperforming all watermarking and fingerprinting approaches.

Abstract

Graph neural networks (GNNs) have demonstrated superior performance in various applications, such as recommendation systems and financial risk management. However, deploying large-scale GNN models locally is particularly challenging for users, as it requires significant computational resources and extensive property data. Consequently, Machine Learning as a Service (MLaaS) has become increasingly popular, offering a convenient way to deploy and access various models, including GNNs. However, an emerging threat known as Model Extraction Attacks (MEAs) presents significant risks, as adversaries can readily obtain surrogate GNN models exhibiting similar functionality. Specifically, attackers repeatedly query the target model using subgraph inputs to collect corresponding responses. These input-output pairs are subsequently utilized to train their own surrogate models at minimal cost. Many techniques have been proposed to defend against MEAs, but most are limited to specific output levels (e.g., embedding or label) and suffer from inherent technical drawbacks. To address these limitations, we propose a novel ownership verification framework CITED which is a first-of-its-kind method to achieve ownership verification on both embedding and label levels. Moreover, CITED is a novel signature-based method that neither harms downstream performance nor introduces auxiliary models that reduce efficiency, while still outperforming all watermarking and fingerprinting approaches. Extensive experiments demonstrate the effectiveness and robustness of our CITED framework. Code is available at: https://github.com/LabRAI/CITED.

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

TL;DR

A novel ownership verification framework CITED is proposed which is a first-of-its-kind method to achieve ownership verification on both embedding and label levels and is a novel signature-based method that neither harms downstream performance nor introduces auxiliary models that reduce efficiency, while still outperforming all watermarking and fingerprinting approaches.

Abstract

Graph neural networks (GNNs) have demonstrated superior performance in various applications, such as recommendation systems and financial risk management. However, deploying large-scale GNN models locally is particularly challenging for users, as it requires significant computational resources and extensive property data. Consequently, Machine Learning as a Service (MLaaS) has become increasingly popular, offering a convenient way to deploy and access various models, including GNNs. However, an emerging threat known as Model Extraction Attacks (MEAs) presents significant risks, as adversaries can readily obtain surrogate GNN models exhibiting similar functionality. Specifically, attackers repeatedly query the target model using subgraph inputs to collect corresponding responses. These input-output pairs are subsequently utilized to train their own surrogate models at minimal cost. Many techniques have been proposed to defend against MEAs, but most are limited to specific output levels (e.g., embedding or label) and suffer from inherent technical drawbacks. To address these limitations, we propose a novel ownership verification framework CITED which is a first-of-its-kind method to achieve ownership verification on both embedding and label levels. Moreover, CITED is a novel signature-based method that neither harms downstream performance nor introduces auxiliary models that reduce efficiency, while still outperforming all watermarking and fingerprinting approaches. Extensive experiments demonstrate the effectiveness and robustness of our CITED framework. Code is available at: https://github.com/LabRAI/CITED.
Paper Structure (63 sections, 5 theorems, 30 equations, 16 figures, 21 tables)

This paper contains 63 sections, 5 theorems, 30 equations, 16 figures, 21 tables.

Key Result

Theorem 4.1

Let ${\bm{e}}_a = {\bm{f}}_{{\bm{w}}}({\bm{X}},{\bm{A}})$ and ${\bm{e}}_b = {\bm{f}}_{{\bm{w}}+{\bm{u}}}({\bm{X}},{\bm{A}})$, where ${\bm{u}} = \operatorname{vec}(\{{\bm{U}}_1,\dots,{\bm{U}}_L\})$. Suppose the random matrices $\{{\bm{U}}_i\}_{i=1}^{L}$ are mutually independent, satisfy $\mathbb{E}[{ For all $\lambda \ge \Delta_{\mathcal{G}}$, the event $D_p \le \Delta_G$ implies $\Pr(D_p < \lambda

Figures (16)

  • Figure 1: Performance of different defense models $f_D$ on the downstream task as the number of embedded ownership verification flags increases.
  • Figure 2: Left: Inference time comparison between CITED and GrOVe at the embedding level across varying datasets. Right: ARUC of our proposed signature for ownership verification.
  • Figure : Cora
  • Figure : Cora
  • Figure : Cora
  • ...and 11 more figures

Theorems & Definitions (12)

  • Definition 2.1: Threat Model
  • Definition 2.2: Signature for GNNs
  • Theorem 4.1: Probabilistic Wasserstein Bound
  • Theorem 4.2: Prediction‑Agreement Probability
  • Lemma 1.1: MPGNN Perturbation Bound liao2020pac
  • Proposition 1.2: Embedding Wasserstein Bound
  • proof
  • proof : Proof for Proposition \ref{['prop:label']}
  • Proposition 1.3: Label Confidence Bound
  • proof
  • ...and 2 more