Robust GNN Watermarking via Implicit Perception of Topological Invariants
Jipeng Li, Yannning Shen
TL;DR
InvGNN-WM proposes a trigger-free watermark for graph neural networks by tying ownership to a model’s implicit perception of a graph invariant, instantiated via the normalized algebraic connectivity $ ilde{oldsymbol{ambda}}_2$. The watermark is learned through a dual-objective loss that preserves task utility while teaching a lightweight head to predict the invariant on owner-private carrier graphs, enabling black-box verification with a calibrated threshold. The authors establish imperceptibility, robustness, uniqueness, and unremovability guarantees, including an NP-completeness result for exact removal, and show strong empirical performance across 13 dataset–backbone configurations with resistance to pruning, fine-tuning, and quantization, while recovering under KD+WM. This invariant-coupled approach provides durable ownership verification that aligns watermark signals with the model’s core reasoning, offering a practical and provable pathway to protecting GNN IP. The work also outlines how to extend the framework to other invariants, multi-invariant ensembles, and adaptive threat models.
Abstract
Graph Neural Networks (GNNs) are valuable intellectual property, yet many watermarks rely on backdoor triggers that break under common model edits and create ownership ambiguity. We present InvGNN-WM, which ties ownership to a model's implicit perception of a graph invariant, enabling trigger-free, black-box verification with negligible task impact. A lightweight head predicts normalized algebraic connectivity on an owner-private carrier set; a sign-sensitive decoder outputs bits, and a calibrated threshold controls the false-positive rate. Across diverse node and graph classification datasets and backbones, InvGNN-WM matches clean accuracy while yielding higher watermark accuracy than trigger- and compression-based baselines. It remains strong under unstructured pruning, fine-tuning, and post-training quantization; plain knowledge distillation (KD) weakens the mark, while KD with a watermark loss (KD+WM) restores it. We provide guarantees for imperceptibility and robustness, and we prove that exact removal is NP-complete.
