Table of Contents
Fetching ...

Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data

Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo

TL;DR

This work addresses the vulnerability of trigger-set watermarking to functionality-stealing attacks by reframing watermark embedding as a feature-learning problem using multi-view data. The proposed MAT method constructs a trigger set from samples that exhibit multiple features and trains the source model with a simple feature-regularization term to align features with the intended modified class, enabling reliable ownership verification even under black-box and white-box attacks. Key contributions include a margin-based multi-view trigger selection, a light-weight feature regularization objective, and extensive experiments showing MAT outperforms baselines on CIFAR-10/100 and ImageNet, with robust defense against model extraction, distillation, and pruning/fine-tuning. The approach is architecture-agnostic, easy to implement, and does not require access to the source data or model internals, making it practically appealing for IP protection in MLaaS settings. The work advances watermarking by leveraging the diversity of real-world features to harden ownership verification against sophisticated extraction threats.

Abstract

With the increasing prevalence of Machine Learning as a Service (MLaaS) platforms, there is a growing focus on deep neural network (DNN) watermarking techniques. These methods are used to facilitate the verification of ownership for a target DNN model to protect intellectual property. One of the most widely employed watermarking techniques involves embedding a trigger set into the source model. Unfortunately, existing methodologies based on trigger sets are still susceptible to functionality-stealing attacks, potentially enabling adversaries to steal the functionality of the source model without a reliable means of verifying ownership. In this paper, we first introduce a novel perspective on trigger set-based watermarking methods from a feature learning perspective. Specifically, we demonstrate that by selecting data exhibiting multiple features, also referred to as \emph{multi-view data}, it becomes feasible to effectively defend functionality stealing attacks. Based on this perspective, we introduce a novel watermarking technique based on Multi-view dATa, called MAT, for efficiently embedding watermarks within DNNs. This approach involves constructing a trigger set with multi-view data and incorporating a simple feature-based regularization method for training the source model. We validate our method across various benchmarks and demonstrate its efficacy in defending against model extraction attacks, surpassing relevant baselines by a significant margin. The code is available at: \href{https://github.com/liyuxuan-github/MAT}{https://github.com/liyuxuan-github/MAT}.

Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data

TL;DR

This work addresses the vulnerability of trigger-set watermarking to functionality-stealing attacks by reframing watermark embedding as a feature-learning problem using multi-view data. The proposed MAT method constructs a trigger set from samples that exhibit multiple features and trains the source model with a simple feature-regularization term to align features with the intended modified class, enabling reliable ownership verification even under black-box and white-box attacks. Key contributions include a margin-based multi-view trigger selection, a light-weight feature regularization objective, and extensive experiments showing MAT outperforms baselines on CIFAR-10/100 and ImageNet, with robust defense against model extraction, distillation, and pruning/fine-tuning. The approach is architecture-agnostic, easy to implement, and does not require access to the source data or model internals, making it practically appealing for IP protection in MLaaS settings. The work advances watermarking by leveraging the diversity of real-world features to harden ownership verification against sophisticated extraction threats.

Abstract

With the increasing prevalence of Machine Learning as a Service (MLaaS) platforms, there is a growing focus on deep neural network (DNN) watermarking techniques. These methods are used to facilitate the verification of ownership for a target DNN model to protect intellectual property. One of the most widely employed watermarking techniques involves embedding a trigger set into the source model. Unfortunately, existing methodologies based on trigger sets are still susceptible to functionality-stealing attacks, potentially enabling adversaries to steal the functionality of the source model without a reliable means of verifying ownership. In this paper, we first introduce a novel perspective on trigger set-based watermarking methods from a feature learning perspective. Specifically, we demonstrate that by selecting data exhibiting multiple features, also referred to as \emph{multi-view data}, it becomes feasible to effectively defend functionality stealing attacks. Based on this perspective, we introduce a novel watermarking technique based on Multi-view dATa, called MAT, for efficiently embedding watermarks within DNNs. This approach involves constructing a trigger set with multi-view data and incorporating a simple feature-based regularization method for training the source model. We validate our method across various benchmarks and demonstrate its efficacy in defending against model extraction attacks, surpassing relevant baselines by a significant margin. The code is available at: \href{https://github.com/liyuxuan-github/MAT}{https://github.com/liyuxuan-github/MAT}.
Paper Structure (19 sections, 8 equations, 11 figures, 11 tables)

This paper contains 19 sections, 8 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Some sample images in the trigger set selected by MAT on CIFAR-10. The images exhibit features from different classes as expected. The true classes, along with the classes having the second-highest scores, are displayed in parentheses.
  • Figure 2: a) Source training (Eq. \ref{['eq: source_train']}), b) Trigger set training (Eq. \ref{['eq: trigger_train']}), c) Feature regularization (Eq. \ref{['eq: final_loss']}) d) Surrogate model (Eq. \ref{['eq: surrogate_train']}). The proposed MAT identifies samples close to the decision boundary as the trigger set and adjusts the features of this set to align closely with the class center of the modified label.
  • Figure 3: MAT significantly outperforms the margin-based approach in watermarking effectiveness when facing white-box attacks.
  • Figure 4: A large $\alpha$ enhances feature regularization, thereby resulting in improved watermarking performance.
  • Figure 5: MAT can achieve strong watermarking performance with a small trigger set consisting of multi-view data.
  • ...and 6 more figures