Watermarking Graph Neural Networks via Explanations for Ownership Protection
Jane Downer, Ren Wang, Binghui Wang
TL;DR
This work addresses IP protection for Graph Neural Networks by proposing an explanation-based watermarking method that embeds ownership signals into GNN explanations rather than training data or outputs. The approach trains the GNN with a dual objective that aligns explanations of a small set of watermarked subgraphs with a secret watermark, enabling black-box ownership verification via a statistically significant mutual-information test on binarized explanations. The authors prove the watermarking mechanism is NP-hard to locate in the worst case and demonstrate robustness to pruning and fine-tuning while preserving high task accuracy; they also show the watermark is difficult to detect or remove through realistic attack models. Overall, the method provides a data-pollution-free, unambiguous, and scalable means of protecting GNN intellectual property with strong theoretical guarantees and empirical validation across multiple datasets and architectures.
Abstract
Graph Neural Networks (GNNs) are the mainstream method to learn pervasive graph data and are widely deployed in industry, making their intellectual property valuable. However, protecting GNNs from unauthorized use remains a challenge. Watermarking, which embeds ownership information into a model, is a potential solution. However, existing watermarking methods have two key limitations: First, almost all of them focus on non-graph data, with watermarking GNNs for complex graph data largely unexplored. Second, the de facto backdoor-based watermarking methods pollute training data and induce ownership ambiguity through intentional misclassification. Our explanation-based watermarking inherits the strengths of backdoor-based methods (e.g., robust to watermark removal attacks), but avoids data pollution and eliminates intentional misclassification. In particular, our method learns to embed the watermark in GNN explanations such that this unique watermark is statistically distinct from other potential solutions, and ownership claims must show statistical significance to be verified. We theoretically prove that, even with full knowledge of our method, locating the watermark is an NP-hard problem. Empirically, our method manifests robustness to removal attacks like fine-tuning and pruning. By addressing these challenges, our approach marks a significant advancement in protecting GNN intellectual property.
