Table of Contents
Fetching ...

Robust GNN Watermarking via Implicit Perception of Topological Invariants

Jipeng Li, Yannning Shen

TL;DR

InvGNN-WM proposes a trigger-free watermark for graph neural networks by tying ownership to a model’s implicit perception of a graph invariant, instantiated via the normalized algebraic connectivity $ ilde{oldsymbol{ ambda}}_2$. The watermark is learned through a dual-objective loss that preserves task utility while teaching a lightweight head to predict the invariant on owner-private carrier graphs, enabling black-box verification with a calibrated threshold. The authors establish imperceptibility, robustness, uniqueness, and unremovability guarantees, including an NP-completeness result for exact removal, and show strong empirical performance across 13 dataset–backbone configurations with resistance to pruning, fine-tuning, and quantization, while recovering under KD+WM. This invariant-coupled approach provides durable ownership verification that aligns watermark signals with the model’s core reasoning, offering a practical and provable pathway to protecting GNN IP. The work also outlines how to extend the framework to other invariants, multi-invariant ensembles, and adaptive threat models.

Abstract

Graph Neural Networks (GNNs) are valuable intellectual property, yet many watermarks rely on backdoor triggers that break under common model edits and create ownership ambiguity. We present InvGNN-WM, which ties ownership to a model's implicit perception of a graph invariant, enabling trigger-free, black-box verification with negligible task impact. A lightweight head predicts normalized algebraic connectivity on an owner-private carrier set; a sign-sensitive decoder outputs bits, and a calibrated threshold controls the false-positive rate. Across diverse node and graph classification datasets and backbones, InvGNN-WM matches clean accuracy while yielding higher watermark accuracy than trigger- and compression-based baselines. It remains strong under unstructured pruning, fine-tuning, and post-training quantization; plain knowledge distillation (KD) weakens the mark, while KD with a watermark loss (KD+WM) restores it. We provide guarantees for imperceptibility and robustness, and we prove that exact removal is NP-complete.

Robust GNN Watermarking via Implicit Perception of Topological Invariants

TL;DR

InvGNN-WM proposes a trigger-free watermark for graph neural networks by tying ownership to a model’s implicit perception of a graph invariant, instantiated via the normalized algebraic connectivity . The watermark is learned through a dual-objective loss that preserves task utility while teaching a lightweight head to predict the invariant on owner-private carrier graphs, enabling black-box verification with a calibrated threshold. The authors establish imperceptibility, robustness, uniqueness, and unremovability guarantees, including an NP-completeness result for exact removal, and show strong empirical performance across 13 dataset–backbone configurations with resistance to pruning, fine-tuning, and quantization, while recovering under KD+WM. This invariant-coupled approach provides durable ownership verification that aligns watermark signals with the model’s core reasoning, offering a practical and provable pathway to protecting GNN IP. The work also outlines how to extend the framework to other invariants, multi-invariant ensembles, and adaptive threat models.

Abstract

Graph Neural Networks (GNNs) are valuable intellectual property, yet many watermarks rely on backdoor triggers that break under common model edits and create ownership ambiguity. We present InvGNN-WM, which ties ownership to a model's implicit perception of a graph invariant, enabling trigger-free, black-box verification with negligible task impact. A lightweight head predicts normalized algebraic connectivity on an owner-private carrier set; a sign-sensitive decoder outputs bits, and a calibrated threshold controls the false-positive rate. Across diverse node and graph classification datasets and backbones, InvGNN-WM matches clean accuracy while yielding higher watermark accuracy than trigger- and compression-based baselines. It remains strong under unstructured pruning, fine-tuning, and post-training quantization; plain knowledge distillation (KD) weakens the mark, while KD with a watermark loss (KD+WM) restores it. We provide guarantees for imperceptibility and robustness, and we prove that exact removal is NP-complete.

Paper Structure

This paper contains 94 sections, 9 theorems, 47 equations, 5 figures, 7 tables, 1 algorithm.

Key Result

Theorem 5.1

Let $\tilde{\theta}:=\arg\min_\theta J(\theta)$ with $J(\theta)=\mathcal{L}_{\text{task}}(\theta)+\beta_{\text{wm}}\mathcal{L}_{\text{wm}}(\theta)$, and let $\theta^\star:=\arg\min_\theta \mathcal{L}_{\text{task}}(\theta)$. Assume a local PL inequality for $\mathcal{L}_{\text{task}}$ with constant $ then the watermarked model preserves task loss:

Figures (5)

  • Figure 1: Imperceptibility on PROTEINS/GIN. Task ACC and WM-ACC vs. normalized watermark weight $\beta_{\text{wm}}$ (mean $\pm$95% CI; $n{=}3$).
  • Figure 2: Robustness on PROTEINS/GIN under edits. WM-ACC across pruning, fine-tuning, KD, KD+WM, and 8/4-bit PTQ. Dashed line: $\kappa_{\text{marg}}$.
  • Figure 3: Comparative robustness to four targeted attacks. Bars show WM-ACC before (gray) vs. after (color) each attack across all methods. Channel scrub cripples trigger-based and channel-localized watermarks, while OURS (invariant-coupled) remains robust. Zero WM head primarily hurts head-centric schemes; OURS degrades mildly. KD-kill weakens all methods, but OURS is recoverable via KD+WM. FT-clean induces only small drops, consistent with our margin analysis.
  • Figure 4: WM-ACC vs. invariant perturbation. Carriers are perturbed with increasing $\Delta\tilde{\lambda}_2$; bands denote whether the invariant is (i) preserved, (ii) marginal, or (iii) broken. Observation. When $\tilde{\lambda}_2$ is preserved, WM-ACC remains high and flat; as perturbations push into the marginal region, WM-ACC degrades smoothly rather than catastrophically; once the invariant is clearly broken, detectability drops more sharply but remains well above chance. Implication. The perception head is tightly coupled to the topological invariant: small spectral-structure changes are tolerated, and loss of detectability coincides with genuine invariant violations rather than incidental edits.
  • Figure 5: Forger curves under adaptive attacks ($m{=}128$, target $\alpha{=}10^{-6}$). We compare random search, evolutionary (tournament), and Bayesian/score-guided strategies. Observation. Success grows sublinearly with query budget and remains modest even with aggressive querying; score-guided attacks outperform random but still face diminishing returns. Implication. The pooled-threshold requirement and margin-based sign preservation impose a coherence constraint across many carriers, making local improvements hard to compound across the full audit.

Theorems & Definitions (19)

  • Theorem 5.1: Task-loss bound
  • proof : Sketch
  • Theorem 5.2: Robustness
  • Theorem 5.3: Key uniqueness under carrier-induced keys
  • proof : Proof Sketch
  • Theorem 5.4: NP-completeness of WM--Remove
  • proof : Proof Sketch
  • Lemma C.1: Gradient of the watermark loss
  • proof
  • Lemma C.2: Uniform bound on $\|\nabla_\theta \mathcal{L}_{\text{wm}}\|$
  • ...and 9 more