Table of Contents
Fetching ...

Holmes: Towards Effective and Harmless Model Ownership Verification to Personalized Large Vision Models via Decoupling Common Features

Linghui Zhu, Yiming Li, Haiqin Weng, Yan Liu, Tianwei Zhang, Shu-Tao Xia, Zhi Wang

TL;DR

Holmes addresses the challenge of safely verifying ownership of personalized large vision models against model-stealing attacks. It decouples shared common features from dataset-specific ones by constructing a poisoned shadow model and a benign shadow model, then trains a lightweight meta-classifier on their output differences and uses a hypothesis-test framework for verification. The approach achieves robust, harmless ownership verification across diverse attacks and datasets (CIFAR-10, ImageNet) and extends to captioning tasks, outperforming existing watermarking and fingerprinting baselines while avoiding misjudgments with similar distributions. This work provides a practical, extensible framework for protecting private fine-tuned models in real-world deployments with quantified verification reliability.

Abstract

Large vision models (LVMs) achieve remarkable performance in various downstream tasks, primarily by personalizing pre-trained models through fine-tuning with private and valuable local data, which makes the personalized model a valuable intellectual property. Similar to the era of traditional DNNs, model stealing attacks also pose significant risks to LVMs. However, this paper reveals that most existing defense methods (developed for traditional DNNs), typically designed for models trained from scratch, either introduce additional security risks, are prone to misjudgment, or are even ineffective for fine-tuned models. To alleviate these problems, this paper proposes a harmless model ownership verification method for personalized LVMs by decoupling similar common features. In general, our method consists of three main stages. In the first stage, we create shadow models that retain common features of the victim model while disrupting dataset-specific features. We represent the dataset-specific features of the victim model by computing the output differences between the shadow and victim models, without altering the victim model or its training process. After that, a meta-classifier is trained to identify stolen models by determining whether suspicious models contain the dataset-specific features of the victim. In the third stage, we conduct model ownership verification by hypothesis test to mitigate randomness and enhance robustness. Extensive experiments on benchmark datasets verify the effectiveness of the proposed method in detecting different types of model stealing simultaneously. Our codes are available at https://github.com/zlh-thu/Holmes.

Holmes: Towards Effective and Harmless Model Ownership Verification to Personalized Large Vision Models via Decoupling Common Features

TL;DR

Holmes addresses the challenge of safely verifying ownership of personalized large vision models against model-stealing attacks. It decouples shared common features from dataset-specific ones by constructing a poisoned shadow model and a benign shadow model, then trains a lightweight meta-classifier on their output differences and uses a hypothesis-test framework for verification. The approach achieves robust, harmless ownership verification across diverse attacks and datasets (CIFAR-10, ImageNet) and extends to captioning tasks, outperforming existing watermarking and fingerprinting baselines while avoiding misjudgments with similar distributions. This work provides a practical, extensible framework for protecting private fine-tuned models in real-world deployments with quantified verification reliability.

Abstract

Large vision models (LVMs) achieve remarkable performance in various downstream tasks, primarily by personalizing pre-trained models through fine-tuning with private and valuable local data, which makes the personalized model a valuable intellectual property. Similar to the era of traditional DNNs, model stealing attacks also pose significant risks to LVMs. However, this paper reveals that most existing defense methods (developed for traditional DNNs), typically designed for models trained from scratch, either introduce additional security risks, are prone to misjudgment, or are even ineffective for fine-tuned models. To alleviate these problems, this paper proposes a harmless model ownership verification method for personalized LVMs by decoupling similar common features. In general, our method consists of three main stages. In the first stage, we create shadow models that retain common features of the victim model while disrupting dataset-specific features. We represent the dataset-specific features of the victim model by computing the output differences between the shadow and victim models, without altering the victim model or its training process. After that, a meta-classifier is trained to identify stolen models by determining whether suspicious models contain the dataset-specific features of the victim. In the third stage, we conduct model ownership verification by hypothesis test to mitigate randomness and enhance robustness. Extensive experiments on benchmark datasets verify the effectiveness of the proposed method in detecting different types of model stealing simultaneously. Our codes are available at https://github.com/zlh-thu/Holmes.

Paper Structure

This paper contains 41 sections, 2 theorems, 23 equations, 3 figures, 15 tables.

Key Result

Theorem 1

Let $\bm{X}$ be a random variable representing samples from $\mathcal{D}_s$. Assume that $\mu_{B} \triangleq \mathbb{P}(C(\bm{O}_s(\bm{X})) = -1) < \beta$. We claim that the verification process can reject the null hypothesis $H_0$ at the significance level $\alpha$ if the identification success rat where $\Delta = t_{1-\alpha}^4 + 4t_{1-\alpha}^2(m-1)(\beta+\tau)(1-\beta - \tau)$, $t_{1-\alpha}$

Figures (3)

  • Figure 1: Limitations of existing model ownership verification and our solution via decoupling common features. Existing model ownership verification methods suffer from three main limitations: (1) Introducing harmful modifications ($e.g.$, backdoor) that compromise model reliability; (2) Vulnerability to misjudgments when models share common features; (3) Ineffectiveness for fine-tuned models since most defense methods are primarily designed for models trained from scratch. We propose Holmes to systematically alleviate these limitations via feature decoupling: (1) Ensuring harmlessness through non-invasive verification that makes use of shadow models, avoiding modifications to the original victim model; (2) Achieving robustness through an ownership meta-classifier to identify dataset-specific features; (3) Ensuring reliability for fine-tuned models by leveraging the learned inherent dataset-specific features, eliminating the need for external feature embedding or artificial modifications during fine-tuning.
  • Figure 2: The main pipeline of Holmes. Step 1. Creating Shadow Models: This step involves generating two shadow models to represent dataset-specific features. (a) The poisoned shadow model disrupts dataset-specific features while preserving common features through backdoor attack. (b) The benign shadow model introduces distinct dataset-specific features by fine-tuning on a filtered dataset with similar common features. Step 2. Training Ownership Meta-Classifier: A meta-classifier is trained using the output differences between the shadow models and the victim model to verify ownership. Step 3. Ownership Verification with Hypothesis Test: The final step involves conducting model ownership verification through a hypothesis test to mitigate randomness and further enhance robustness. The verification process requires no modifications to the victim model, ensuring the harmlessness of Holmes.
  • Figure 3: Images involved in different defenses. (a) original image; (b) poisoned image in BadNets; (c) poisoned image in GM; (d) out-of-distribution watermark image from SVHN dataset in EWE; (e) poisoned image in PTYNet; (f) noised image in UAE; (g) noised image in UAPs; (h) noised image in Metafinger; (i) transformed image in MOVE; (j) poisoned image in Holmes for creating the poisoned shadow model.

Theorems & Definitions (8)

  • Definition 1
  • Theorem 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 1
  • Theorem 1
  • proof