Table of Contents
Fetching ...

Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking

Yuan Yao, Jin Song, Jian Jin

TL;DR

NeuralMark introduces a hashed watermark filter for weight-based neural network watermarking to defend against forging and overwriting attacks. By generating an irreversible binary watermark from a secret key via a hash function and interleaving it with parameter embedding, NeuralMark achieves gradient obfuscation and embedding isolation, further strengthened by average pooling. The paper provides a security-bound analysis showing forging probability is negligible for typical settings, and demonstrates strong fidelity and robustness across 13 architectures on image and text tasks, including resistance to fine-tuning and pruning. Empirical results, supported by theoretical analysis, indicate that the hashed watermark filter offers a scalable, architecture-agnostic approach for robust model ownership verification with practical implications for safeguarding valuable AI assets.

Abstract

As valuable digital assets, deep neural networks necessitate robust ownership protection, positioning neural network watermarking (NNW) as a promising solution. Among various NNW approaches, weight-based methods are favored for their simplicity and practicality; however, they remain vulnerable to forging and overwriting attacks. To address those challenges, we propose NeuralMark, a robust method built around a hashed watermark filter. Specifically, we utilize a hash function to generate an irreversible binary watermark from a secret key, which is then used as a filter to select the model parameters for embedding. This design cleverly intertwines the embedding parameters with the hashed watermark, providing a robust defense against both forging and overwriting attacks. An average pooling is also incorporated to resist fine-tuning and pruning attacks. Furthermore, it can be seamlessly integrated into various neural network architectures, ensuring broad applicability. Theoretically, we analyze its security boundary. Empirically, we verify its effectiveness and robustness across 13 distinct Convolutional and Transformer architectures, covering five image classification tasks and one text generation task. The source codes are available at https://github.com/AIResearch-Group/NeuralMark.

Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking

TL;DR

NeuralMark introduces a hashed watermark filter for weight-based neural network watermarking to defend against forging and overwriting attacks. By generating an irreversible binary watermark from a secret key via a hash function and interleaving it with parameter embedding, NeuralMark achieves gradient obfuscation and embedding isolation, further strengthened by average pooling. The paper provides a security-bound analysis showing forging probability is negligible for typical settings, and demonstrates strong fidelity and robustness across 13 architectures on image and text tasks, including resistance to fine-tuning and pruning. Empirical results, supported by theoretical analysis, indicate that the hashed watermark filter offers a scalable, architecture-agnostic approach for robust model ownership verification with practical implications for safeguarding valuable AI assets.

Abstract

As valuable digital assets, deep neural networks necessitate robust ownership protection, positioning neural network watermarking (NNW) as a promising solution. Among various NNW approaches, weight-based methods are favored for their simplicity and practicality; however, they remain vulnerable to forging and overwriting attacks. To address those challenges, we propose NeuralMark, a robust method built around a hashed watermark filter. Specifically, we utilize a hash function to generate an irreversible binary watermark from a secret key, which is then used as a filter to select the model parameters for embedding. This design cleverly intertwines the embedding parameters with the hashed watermark, providing a robust defense against both forging and overwriting attacks. An average pooling is also incorporated to resist fine-tuning and pruning attacks. Furthermore, it can be seamlessly integrated into various neural network architectures, ensuring broad applicability. Theoretically, we analyze its security boundary. Empirically, we verify its effectiveness and robustness across 13 distinct Convolutional and Transformer architectures, covering five image classification tasks and one text generation task. The source codes are available at https://github.com/AIResearch-Group/NeuralMark.

Paper Structure

This paper contains 52 sections, 3 equations, 12 figures, 15 tables, 2 algorithms.

Figures (12)

  • Figure 1: Illustration of the hashed watermark filter. The model owner's hashed watermark is $[1, 0, 1, 0]$, while the adversary's is $[0, 1, 1, 0]$. The watermark is repeated to match the parameter length before each round of filtering. Without filtering, all 16 parameters overlap. After the first round, each watermark retains eight parameters with four overlapping; after the second round, only four parameters remain for each, with no overlap.
  • Figure 2: Comparison of resistance to pruning attacks under various pruning ratios on CIFAR-10 using AlexNet and ResNet-18.
  • Figure 3: Parameter distribution and performance convergence on the CIFAR-100 dataset using ResNet-18.
  • Figure 4: Comparison of parameter overlap ratio with different filter rounds on CIFAR-100 using ResNet-18.
  • Figure 5: Illustrations of the processes for watermark generation (a), embedding (b), and verification (c).
  • ...and 7 more figures