ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction
Tong Zhou, Shijin Duan, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu
TL;DR
ProDiF tackles the security of on-device pre-trained DNNs by neutralizing both source-domain inference and cross-domain transfer through targeted weight-space manipulation. It identifies domain-invariant features by generating auxiliary domains via a conditional Wasserstein Auto-Encoder, ranks filters by transferability, and perturbs weights of the most transferable filters, with benign versions stored in a Trusted Execution Environment (TEE) to preserve authorized-user performance. A bi-level optimization framework is employed to maximize resilience against adaptive fine-tuning, simulating attacker updates on auxiliary domains and enforcing degradation across both source and target domains. Experimental results across digits, natural images, and VisDA show near-random source-domain accuracy and a substantial reduction in transferability (about 74.65%), with a small secure-memory footprint, offering a practical, comprehensive defense for on-device pre-trained weights. This approach demonstrates a novel use of weight-space manipulation to safeguard intellectual property while maintaining usability for legitimate users.
Abstract
Pre-trained models are valuable intellectual property, capturing both domain-specific and domain-invariant features within their weight spaces. However, model extraction attacks threaten these assets by enabling unauthorized source-domain inference and facilitating cross-domain transfer via the exploitation of domain-invariant features. In this work, we introduce **ProDiF**, a novel framework that leverages targeted weight space manipulation to secure pre-trained models against extraction attacks. **ProDiF** quantifies the transferability of filters and perturbs the weights of critical filters in unsecured memory, while preserving actual critical weights in a Trusted Execution Environment (TEE) for authorized users. A bi-level optimization further ensures resilience against adaptive fine-tuning attacks. Experimental results show that **ProDiF** reduces source-domain accuracy to near-random levels and decreases cross-domain transferability by 74.65\%, providing robust protection for pre-trained models. This work offers comprehensive protection for pre-trained DNN models and highlights the potential of weight space manipulation as a novel approach to model security.
