Table of Contents
Fetching ...

Examining the Threat Landscape: Foundation Models and Model Stealing

Ankita Raj, Deepankar Varma, Chetan Arora

TL;DR

The paper analyzes model stealing risks for vision systems built from foundation models, showing that victims finetuned from ViTs are more susceptible to theft than traditional CNN baselines when an attacker leverages strong foundation-model thieves. Using proxy data from ImageNet and a mixture of ViT and ResNet thieves, the study reports high agreement between victim and thief predictions, with up to 94.28% on CIFAR-10 and substantial gains over CNN-based victims. It demonstrates that richer representations in foundation models, while boosting downstream accuracy, also facilitate theft, especially as attacker and victim both rely on strong backbones. The results advocate for security-aware deployment of foundation-model–based APIs and the development of defenses against model stealing in MLaaS contexts.

Abstract

Foundation models (FMs) for computer vision learn rich and robust representations, enabling their adaptation to task/domain-specific deployments with little to no fine-tuning. However, we posit that the very same strength can make applications based on FMs vulnerable to model stealing attacks. Through empirical analysis, we reveal that models fine-tuned from FMs harbor heightened susceptibility to model stealing, compared to conventional vision architectures like ResNets. We hypothesize that this behavior is due to the comprehensive encoding of visual patterns and features learned by FMs during pre-training, which are accessible to both the attacker and the victim. We report that an attacker is able to obtain 94.28% agreement (matched predictions with victim) for a Vision Transformer based victim model (ViT-L/16) trained on CIFAR-10 dataset, compared to only 73.20% agreement for a ResNet-18 victim, when using ViT-L/16 as the thief model. We arguably show, for the first time, that utilizing FMs for downstream tasks may not be the best choice for deployment in commercial APIs due to their susceptibility to model theft. We thereby alert model owners towards the associated security risks, and highlight the need for robust security measures to safeguard such models against theft. Code is available at https://github.com/rajankita/foundation_model_stealing.

Examining the Threat Landscape: Foundation Models and Model Stealing

TL;DR

The paper analyzes model stealing risks for vision systems built from foundation models, showing that victims finetuned from ViTs are more susceptible to theft than traditional CNN baselines when an attacker leverages strong foundation-model thieves. Using proxy data from ImageNet and a mixture of ViT and ResNet thieves, the study reports high agreement between victim and thief predictions, with up to 94.28% on CIFAR-10 and substantial gains over CNN-based victims. It demonstrates that richer representations in foundation models, while boosting downstream accuracy, also facilitate theft, especially as attacker and victim both rely on strong backbones. The results advocate for security-aware deployment of foundation-model–based APIs and the development of defenses against model stealing in MLaaS contexts.

Abstract

Foundation models (FMs) for computer vision learn rich and robust representations, enabling their adaptation to task/domain-specific deployments with little to no fine-tuning. However, we posit that the very same strength can make applications based on FMs vulnerable to model stealing attacks. Through empirical analysis, we reveal that models fine-tuned from FMs harbor heightened susceptibility to model stealing, compared to conventional vision architectures like ResNets. We hypothesize that this behavior is due to the comprehensive encoding of visual patterns and features learned by FMs during pre-training, which are accessible to both the attacker and the victim. We report that an attacker is able to obtain 94.28% agreement (matched predictions with victim) for a Vision Transformer based victim model (ViT-L/16) trained on CIFAR-10 dataset, compared to only 73.20% agreement for a ResNet-18 victim, when using ViT-L/16 as the thief model. We arguably show, for the first time, that utilizing FMs for downstream tasks may not be the best choice for deployment in commercial APIs due to their susceptibility to model theft. We thereby alert model owners towards the associated security risks, and highlight the need for robust security measures to safeguard such models against theft. Code is available at https://github.com/rajankita/foundation_model_stealing.

Paper Structure

This paper contains 22 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: (a) Standard model stealing setup: An adversary picks images from a proxy dataset and queries from the victim model to obtain labels. This labeled proxy dataset is used to train the thief model. (b) Victims derived from foundation models are more prone to stealing: We steal three victim models trained on the CIFAR-10 dataset, using a ViT-L/16 thief. Even though using stronger victims based on foundation models like ViT-L/16 improves victim accuracy, but the agreement between the victim's and thief's predictions also increases at the same time, underlining the increased severity of the threat.
  • Figure 2: Agreement between multiple linear-probed victims (trained on three datasets) and a linear-probed ViT-L/16 thief.
  • Figure 3: Agreement for thieves based on foundation models (ViTs) vs. a ResNet-34 based thief. Victim models are linear-probed.
  • Figure 4: Agreement for thieves based on foundation models (ViTs) vs. a ResNet-34 based thief. Victim models are fully fine-tuned.
  • Figure 5: t-SNE visualizations van2008visualizing of embeddings for backbone models (top row), and corresponding victim models (bottom row) trained on CIFAR-10 dataset using linear probing method. Observe that the clusters from the backbone models are much more well-separated for the foundation models (ViT-B/16 and ViT-L/16) compared to ResNets.
  • ...and 6 more figures