Table of Contents
Fetching ...

Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing

Ruyi Ding, Tong Zhou, Lili Su, Aidong Adam Ding, Xiaolin Xu, Yunsi Fei

TL;DR

EncoderLock is introduced, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones.

Abstract

Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers to cope with limited computational resources and data volume. More specifically, probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning, which helps to prevent overfitting and catastrophic forgetting. However, such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions, such as discriminatory speculation and warfare applications. In this work, we introduce EncoderLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones. Achieving this balance is challenging because of the opposite optimization objectives and the variety of downstream heads that adversaries can utilize adaptively. To address these challenges, EncoderLock employs two techniques: domain-aware weight selection and updating to restrict applications on prohibited domains/tasks, and self-challenging training scheme that iteratively strengthens resistance against any potential downstream classifiers that adversaries may apply. Moreover, recognizing the potential lack of data from prohibited domains in practical scenarios, we introduce three EncoderLock variants with different levels of data accessibility: supervised (prohibited domain data with labels), unsupervised (prohibited domain data without labels), and zero-shot (no data or labels available). We verify EncoderLock's effectiveness and practicality with a real-world pre-trained Vision Transformer (ViT) encoder from Facebook. These results underscore the valuable contributions EncoderLock brings to the development of responsible AI.

Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing

TL;DR

EncoderLock is introduced, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones.

Abstract

Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers to cope with limited computational resources and data volume. More specifically, probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning, which helps to prevent overfitting and catastrophic forgetting. However, such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions, such as discriminatory speculation and warfare applications. In this work, we introduce EncoderLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones. Achieving this balance is challenging because of the opposite optimization objectives and the variety of downstream heads that adversaries can utilize adaptively. To address these challenges, EncoderLock employs two techniques: domain-aware weight selection and updating to restrict applications on prohibited domains/tasks, and self-challenging training scheme that iteratively strengthens resistance against any potential downstream classifiers that adversaries may apply. Moreover, recognizing the potential lack of data from prohibited domains in practical scenarios, we introduce three EncoderLock variants with different levels of data accessibility: supervised (prohibited domain data with labels), unsupervised (prohibited domain data without labels), and zero-shot (no data or labels available). We verify EncoderLock's effectiveness and practicality with a real-world pre-trained Vision Transformer (ViT) encoder from Facebook. These results underscore the valuable contributions EncoderLock brings to the development of responsible AI.

Paper Structure

This paper contains 44 sections, 9 equations, 22 figures, 18 tables, 2 algorithms.

Figures (22)

  • Figure 1: Applicability Authorization with EncoderLock: Fixed pre-trained encoders accept user inputs and return representations. Users can utilize them for various customized tasks by probing with downstream heads. EncoderLock aims to prevent malicious probing to pre-defined prohibited domains, which may have different levels of data accessibility, marked by different colors.
  • Figure 2: Overview of the proposed EncoderLock framework and paper organization. The procedure in Round $r$ includes: 1.domain-aware critical weight selection algorithm: take data batches $\mathcal{B}_\mathcal{S}$ and $\mathcal{B}_\mathcal{T}$ from the authorized source dataset $\mathcal{D}_\mathcal{S}$ and the prohibited target dataset $\mathcal{D}_\mathcal{T}$, respectively, and calculate the weight importance with gradients of loss $L_\mathcal{S}$ and $L_\mathcal{T}$ and choose critical weights to update for the round $r$ as $N_r$, note here specific losses depend on different levels of accessibility of the target domain; 2. EncoderLock weight update algorithm (with three variants for the three levels of target domain dataset), utilizing the supervised EncoderLock loss $L_\text{el}$, unsupervised contrastive loss $L_\text{el}^{\text{cont}}$ and the generated synthetic dataset $D_\mathcal{T}'$, respectively.
  • Figure 3: Visualization of weight importance in a pre-trained model---The X-Y plane represents the weight matrix of a selected dense layer in a model trained on MNIST and probed for USPS. The color and height indicate each weight's importance to the output (the higher and darker, the more important).
  • Figure 4: Design motivation of unsupervised EncoderLock
  • Figure 5: Building synthetic datasets for zero-shot EncoderLock
  • ...and 17 more figures