Table of Contents
Fetching ...

DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models

Bita Darvish Rouhani, Huili Chen, Farinaz Koushanfar

TL;DR

DeepSigns tackles the challenge of protecting DL models as IP by embedding robust watermarks in the activation distributions rather than static weights, enabling both white-box and black-box ownership proofs. It introduces a generic functional watermarking approach that uses Gaussian Mixture Model priors for hidden layers and a post-processing strategy for the output layer, achieving resilience to pruning, fine-tuning, and overwriting. The framework is validated on MNIST and CIFAR-10 across MLP, CNN, and WideResNet architectures, with a TensorFlow API to ease adoption and a clear set of metrics for fidelity, reliability, integrity, capacity, efficiency, and security. Overall, DeepSigns provides a practical, generalizable solution for DL IP protection in modern service-enabled deployments.

Abstract

Deep Learning (DL) models have caused a paradigm shift in our ability to comprehend raw data in various important fields, ranging from intelligence warfare and healthcare to autonomous transportation and automated manufacturing. A practical concern, in the rush to adopt DL models as a service, is protecting the models against Intellectual Property (IP) infringement. The DL models are commonly built by allocating significant computational resources that process vast amounts of proprietary training data. The resulting models are therefore considered to be the IP of the model builder and need to be protected to preserve the owner's competitive advantage. This paper proposes DeepSigns, a novel end-to-end IP protection framework that enables insertion of coherent digital watermarks in contemporary DL models. DeepSigns, for the first time, introduces a generic watermarking methodology that can be used for protecting DL owner's IP rights in both white-box and black-box settings, where the adversary may or may not have the knowledge of the model internals. The suggested methodology is based on embedding the owner's signature (watermark) in the probability density function (pdf) of the data abstraction obtained in different layers of a DL model. DeepSigns can demonstrably withstand various removal and transformation attacks, including model compression, model fine-tuning, and watermark overwriting. Proof-of-concept evaluations on MNIST, and CIFAR10 datasets, as well as a wide variety of neural network architectures including Wide Residual Networks, Convolution Neural Networks, and Multi-Layer Perceptrons corroborate DeepSigns' effectiveness and applicability.

DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models

TL;DR

DeepSigns tackles the challenge of protecting DL models as IP by embedding robust watermarks in the activation distributions rather than static weights, enabling both white-box and black-box ownership proofs. It introduces a generic functional watermarking approach that uses Gaussian Mixture Model priors for hidden layers and a post-processing strategy for the output layer, achieving resilience to pruning, fine-tuning, and overwriting. The framework is validated on MNIST and CIFAR-10 across MLP, CNN, and WideResNet architectures, with a TensorFlow API to ease adoption and a clear set of metrics for fidelity, reliability, integrity, capacity, efficiency, and security. Overall, DeepSigns provides a practical, generalizable solution for DL IP protection in modern service-enabled deployments.

Abstract

Deep Learning (DL) models have caused a paradigm shift in our ability to comprehend raw data in various important fields, ranging from intelligence warfare and healthcare to autonomous transportation and automated manufacturing. A practical concern, in the rush to adopt DL models as a service, is protecting the models against Intellectual Property (IP) infringement. The DL models are commonly built by allocating significant computational resources that process vast amounts of proprietary training data. The resulting models are therefore considered to be the IP of the model builder and need to be protected to preserve the owner's competitive advantage. This paper proposes DeepSigns, a novel end-to-end IP protection framework that enables insertion of coherent digital watermarks in contemporary DL models. DeepSigns, for the first time, introduces a generic watermarking methodology that can be used for protecting DL owner's IP rights in both white-box and black-box settings, where the adversary may or may not have the knowledge of the model internals. The suggested methodology is based on embedding the owner's signature (watermark) in the probability density function (pdf) of the data abstraction obtained in different layers of a DL model. DeepSigns can demonstrably withstand various removal and transformation attacks, including model compression, model fine-tuning, and watermark overwriting. Proof-of-concept evaluations on MNIST, and CIFAR10 datasets, as well as a wide variety of neural network architectures including Wide Residual Networks, Convolution Neural Networks, and Multi-Layer Perceptrons corroborate DeepSigns' effectiveness and applicability.

Paper Structure

This paper contains 19 sections, 4 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: DeepSigns Global Flow: DeepSigns performs functional watermarking on DL models by simultaneously embedding a set of binary WM information in the pdf of the activation set acquired at each intermediate layer and the output layer. Typically, a specific set of inputs (keys) is used for extracting the embedded watermark. In our case, the inputs triggering the ingrained binary random strings are used as the key for the detection of IP infringement in both white-box and black-box settings.
  • Figure 2: High-level overview of watermarking the output layer in a neural network. Output watermarking is a post-processing step performed after embedding the selected binary WMs in the intermediate (hidden) layers.
  • Figure 3: Due to the high dimensionality of deep learning models and limited access to labeled training data (the blue and green dots in the figure), there are sub-spaces within the DL model that are rarely explored. DeepSigns exploits this mainly unused capacity to embed the watermark information while minimally affecting ultimate accuracy.
  • Figure 4: Evaluation of the watermark's robustness against parameter pruning. Figures (a) through (c) (first row) illustrate for each of the benchmarks listed in Table \ref{['tab:bench']} in the black-box setting. The horizontal green dotted line is the mismatch threshold obtained from Equation (\ref{['eq:threshold']}). The orange dashed lines show the corresponding test accuracy for each pruning rate. Figures (d) through (f) (second row) show the results for the MNIST and CIFAR10 benchmarks in the white-box setting. The dashed lines demonstrate the pertinent accuracy per pruning rate.
  • Figure 5: Integrity analysis of different benchmarks. The green dotted horizontal lines indicate the detection threshold for various WM lengths. The first three models (model 1-3) are neural networks with the same topology but different parameters compared with the marked model. The last three models (model 4-6) are neural networks with different topologies ( springenberg2014striving, liang2015recurrent, zagoruyko2016wide).
  • ...and 5 more figures