Table of Contents
Fetching ...

ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction

Tong Zhou, Shijin Duan, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu

TL;DR

ProDiF tackles the security of on-device pre-trained DNNs by neutralizing both source-domain inference and cross-domain transfer through targeted weight-space manipulation. It identifies domain-invariant features by generating auxiliary domains via a conditional Wasserstein Auto-Encoder, ranks filters by transferability, and perturbs weights of the most transferable filters, with benign versions stored in a Trusted Execution Environment (TEE) to preserve authorized-user performance. A bi-level optimization framework is employed to maximize resilience against adaptive fine-tuning, simulating attacker updates on auxiliary domains and enforcing degradation across both source and target domains. Experimental results across digits, natural images, and VisDA show near-random source-domain accuracy and a substantial reduction in transferability (about 74.65%), with a small secure-memory footprint, offering a practical, comprehensive defense for on-device pre-trained weights. This approach demonstrates a novel use of weight-space manipulation to safeguard intellectual property while maintaining usability for legitimate users.

Abstract

Pre-trained models are valuable intellectual property, capturing both domain-specific and domain-invariant features within their weight spaces. However, model extraction attacks threaten these assets by enabling unauthorized source-domain inference and facilitating cross-domain transfer via the exploitation of domain-invariant features. In this work, we introduce **ProDiF**, a novel framework that leverages targeted weight space manipulation to secure pre-trained models against extraction attacks. **ProDiF** quantifies the transferability of filters and perturbs the weights of critical filters in unsecured memory, while preserving actual critical weights in a Trusted Execution Environment (TEE) for authorized users. A bi-level optimization further ensures resilience against adaptive fine-tuning attacks. Experimental results show that **ProDiF** reduces source-domain accuracy to near-random levels and decreases cross-domain transferability by 74.65\%, providing robust protection for pre-trained models. This work offers comprehensive protection for pre-trained DNN models and highlights the potential of weight space manipulation as a novel approach to model security.

ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction

TL;DR

ProDiF tackles the security of on-device pre-trained DNNs by neutralizing both source-domain inference and cross-domain transfer through targeted weight-space manipulation. It identifies domain-invariant features by generating auxiliary domains via a conditional Wasserstein Auto-Encoder, ranks filters by transferability, and perturbs weights of the most transferable filters, with benign versions stored in a Trusted Execution Environment (TEE) to preserve authorized-user performance. A bi-level optimization framework is employed to maximize resilience against adaptive fine-tuning, simulating attacker updates on auxiliary domains and enforcing degradation across both source and target domains. Experimental results across digits, natural images, and VisDA show near-random source-domain accuracy and a substantial reduction in transferability (about 74.65%), with a small secure-memory footprint, offering a practical, comprehensive defense for on-device pre-trained weights. This approach demonstrates a novel use of weight-space manipulation to safeguard intellectual property while maintaining usability for legitimate users.

Abstract

Pre-trained models are valuable intellectual property, capturing both domain-specific and domain-invariant features within their weight spaces. However, model extraction attacks threaten these assets by enabling unauthorized source-domain inference and facilitating cross-domain transfer via the exploitation of domain-invariant features. In this work, we introduce **ProDiF**, a novel framework that leverages targeted weight space manipulation to secure pre-trained models against extraction attacks. **ProDiF** quantifies the transferability of filters and perturbs the weights of critical filters in unsecured memory, while preserving actual critical weights in a Trusted Execution Environment (TEE) for authorized users. A bi-level optimization further ensures resilience against adaptive fine-tuning attacks. Experimental results show that **ProDiF** reduces source-domain accuracy to near-random levels and decreases cross-domain transferability by 74.65\%, providing robust protection for pre-trained models. This work offers comprehensive protection for pre-trained DNN models and highlights the potential of weight space manipulation as a novel approach to model security.

Paper Structure

This paper contains 28 sections, 10 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: The workflow of protection and attack. For the model trained on the source domain, ProDiF identifies critical filters (e.g., #3 and #6) leveraging auxiliary domains, then employs bi-level optimization to perturb these weights (red dots) to generate the protected model. This model, stored in unsecured memory, effectively prevents attackers from source-domain inference and cross-domain transfer. The weight perturbations of critical filters stored in TEE secure memory can correct the features influenced by perturbed weights, enabling users to attain high performance.
  • Figure 2: The results of CIFAR10, STL10, and VisDA. The lower accuracy indicates better protection against cross-domain transfer.
  • Figure 3: High-transferability filters mainly extract edge information (orange box, channel #2), whereas low-transferability filters capture more diverse details (blue box, channel #48). With ProDiF protection (red box), the domain-invariant feature will be perturbed from (e) to (g).
  • Figure 4: The visualization of the digits datasets.
  • Figure 5: The examples of auxiliary domains for MNIST.
  • ...and 5 more figures