Table of Contents
Fetching ...

MOVE: Effective and Harmless Ownership Verification via Embedded External Features

Yiming Li, Linghui Zhu, Xiaojun Jia, Yang Bai, Yong Jiang, Shu-Tao Xia, Xiaochun Cao, Kui Ren

TL;DR

MOVE tackles the threat of model stealing by shifting from inherent fingerprints or backdoor watermarks to embedding defender-specified external features through style-transfer. A two-pronged verification pipeline then uses a lightweight meta-classifier trained either on gradients (white-box) or prediction-differences with data augmentation (black-box), followed by a hypothesis-test that leverages a pairwise $t$-test to decide ownership with a bound dependent on $m$, $eta_1$, $eta_2$, and $ ext{$oldsymbol{t}_ ext{$ extalpha$}}$. The approach is designed to be harmless by not altering labels or introducing stealthy backdoors, while demonstrating strong resistance to multiple model-stealing strategies, including multi-stage attacks, across CIFAR-10 and ImageNet subsets. The paper provides both theoretical foundations and extensive empirical evaluations showing MOVE’s effectiveness (low $p$-values) and practical robustness, along with ablations and analyses of hyper-parameters and potential adaptive threats. A public codebase is released for reproducibility, underscoring MOVE’s potential impact for ownership protection in real-world deployments.

Abstract

Currently, deep neural networks (DNNs) are widely adopted in different applications. Despite its commercial values, training a well-performing DNN is resource-consuming. Accordingly, the well-trained model is valuable intellectual property for its owner. However, recent studies revealed the threats of model stealing, where the adversaries can obtain a function-similar copy of the victim model, even when they can only query the model. In this paper, we propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously, without introducing new security risks. In general, we conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by modifying a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. In particular, \revision{we develop our MOVE method under both white-box and black-box settings and analyze its theoretical foundation to provide comprehensive model protection.} Extensive experiments on benchmark datasets verify the effectiveness of our method and its resistance to potential adaptive attacks. The codes for reproducing the main experiments of our method are available at https://github.com/THUYimingLi/MOVE.

MOVE: Effective and Harmless Ownership Verification via Embedded External Features

TL;DR

MOVE tackles the threat of model stealing by shifting from inherent fingerprints or backdoor watermarks to embedding defender-specified external features through style-transfer. A two-pronged verification pipeline then uses a lightweight meta-classifier trained either on gradients (white-box) or prediction-differences with data augmentation (black-box), followed by a hypothesis-test that leverages a pairwise -test to decide ownership with a bound dependent on , , , and oldsymbol{t}_ ext{}}p$-values) and practical robustness, along with ablations and analyses of hyper-parameters and potential adaptive threats. A public codebase is released for reproducibility, underscoring MOVE’s potential impact for ownership protection in real-world deployments.

Abstract

Currently, deep neural networks (DNNs) are widely adopted in different applications. Despite its commercial values, training a well-performing DNN is resource-consuming. Accordingly, the well-trained model is valuable intellectual property for its owner. However, recent studies revealed the threats of model stealing, where the adversaries can obtain a function-similar copy of the victim model, even when they can only query the model. In this paper, we propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously, without introducing new security risks. In general, we conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by modifying a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. In particular, \revision{we develop our MOVE method under both white-box and black-box settings and analyze its theoretical foundation to provide comprehensive model protection.} Extensive experiments on benchmark datasets verify the effectiveness of our method and its resistance to potential adaptive attacks. The codes for reproducing the main experiments of our method are available at https://github.com/THUYimingLi/MOVE.
Paper Structure (46 sections, 2 theorems, 27 equations, 9 figures, 19 tables)

This paper contains 46 sections, 2 theorems, 27 equations, 9 figures, 19 tables.

Key Result

Theorem 1

Given a (pre-trained) meta-classifier $C$ for distinguishing benign and stolen models, let $\beta_1 \triangleq \mathbb{P}(C(g_B)=1)$ and $\beta_2 \triangleq \mathbb{P}(C(g_V)=-1)$ denote its probability of Type-I and Type-II errors, respectively. Model owners can reject the previous null hypothesis where $t_{\alpha}$ is $\alpha$-quantile of t-distribution with $(m-1)$ degrees of freedom and $m$ i

Figures (9)

  • Figure 1: The main pipeline of our MOVE defense. Step 1. Embedding External Features: A subset of training samples is modified using style transfer without altering labels to create a transformed dataset $\mathcal{D}_t$. It will be used to implant external features into the victim model $V$. Step 2. Training Meta-Classifier: In the white-box setting, sign vectors of the gradients from both the victim model $V$ and a benign model $B$ are used to form the training set for the meta-classifier. In the black-box setting (b), prediction differences between augmented transformed images and their original versions are concatenated as input features. Step 3. Ownership Verification: A pairwise T-test is performed using the meta-classifier's outputs to statistically verify model ownership based on a few pairs of transformed images and their benign version.
  • Figure 2: The adopted trigger pattern and synthesized ones obtained from the watermarked and the stolen model. The trigger areas are indicated in the blue box. (a) ground-truth trigger pattern; (b) pattern obtained from the watermarked model; (c) pattern obtained from the stolen model.
  • Figure 3: Images involved in different defenses. (a) benign image; (b) poisoned image in BadNets; (c) poisoned image in Gradient Matching; (d) poisoned image in Entangled Watermarks; (e) style image; (f) transformed image in our MOVE.
  • Figure 4: The effects of the transformation rate (%) and the number of sampled images of our MOVE on the CIFAR-10 dataset.
  • Figure 5: The new style images adopted for the evaluation.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Definition 1: Two Necessary Requirements
  • Definition 2: Inherent and External Features
  • Example 1
  • Definition 3: White-box Verification
  • Definition 4: Black-box Verification
  • Theorem 1
  • Theorem 1
  • proof