Safe and Robust Watermark Injection with a Single OoD Image

Shuyang Yu; Junyuan Hong; Haobo Zhang; Haotao Wang; Zhangyang Wang; Jiayu Zhou

Safe and Robust Watermark Injection with a Single OoD Image

Shuyang Yu, Junyuan Hong, Haobo Zhang, Haotao Wang, Zhangyang Wang, Jiayu Zhou

TL;DR

This work proposes a safe and robust backdoor-based watermark injection technique that leverages the diverse knowledge from a single out-of-distribution (OoD) image, which serves as a secret key for IP verification and is agnostic to third-party promises of IP security.

Abstract

Training a high-performance deep neural network requires large amounts of data and computational resources. Protecting the intellectual property (IP) and commercial ownership of a deep model is challenging yet increasingly crucial. A major stream of watermarking strategies implants verifiable backdoor triggers by poisoning training samples, but these are often unrealistic due to data privacy and safety concerns and are vulnerable to minor model changes such as fine-tuning. To overcome these challenges, we propose a safe and robust backdoor-based watermark injection technique that leverages the diverse knowledge from a single out-of-distribution (OoD) image, which serves as a secret key for IP verification. The independence of training data makes it agnostic to third-party promises of IP security. We induce robustness via random perturbation of model parameters during watermark injection to defend against common watermark removal attacks, including fine-tuning, pruning, and model extraction. Our experimental results demonstrate that the proposed watermarking approach is not only time- and sample-efficient without training data, but also robust against the watermark removal attacks above.

Safe and Robust Watermark Injection with a Single OoD Image

TL;DR

Abstract

Paper Structure (18 sections, 4 equations, 9 figures, 9 tables)

This paper contains 18 sections, 4 equations, 9 figures, 9 tables.

Introduction
Background
DNN Watermarking
Watermark Removal Attack
Method
Constructing Safe Surrogate Dataset
Robust Watermark Injection
Experiments
Watermark Injection
Defending Against Fine-tuning & Pruning
Defending Against Model Extraction
Qualitative Studies
Conclusion
Methodology supplementaries
Extended watermark injection results
...and 3 more sections

Figures (9)

Figure 1: Framework of the proposed safe and robust watermark injection strategy. It first constructs a surrogate dataset from the single-image OoD data source provided with strong augmentation used as the secret key, which is confidential to any third parties. Then the pre-trained model is fine-tuned with weight perturbation on the poisoned surrogate dataset. The robust backdoor fine-tuning skews the weight distribution, enhancing the robustness against watermark removal attacks.
Figure 2: Acc, ID WSR, and OoD WSR for watermark injection.
Figure 3: The distribution of OoD and ID samples. Generation data denotes augmented OoD samples from a single OoD image.
Figure 4: Weight distribution for model w/ and w/o WP. The x-axis is the parameter values, and the y-axis is the number of parameters.
Figure 5: Acc, ID WSR, and OoD WSR for watermark injection. The watermarks are injected quickly with high accuracy and OoDWSR. Triggers with the highest OoDWSR and accuracy degradation of less than $3\%$ are selected for each dataset.
...and 4 more figures

Safe and Robust Watermark Injection with a Single OoD Image

TL;DR

Abstract

Safe and Robust Watermark Injection with a Single OoD Image

Authors

TL;DR

Abstract

Table of Contents

Figures (9)