Table of Contents
Fetching ...

Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels

Shubhi Shukla, Manaar Alam, Pabitra Mitra, Debdeep Mukhopadhyay

TL;DR

This work reveals a dual-path model-stealing threat against MLaaS by combining adversarial image fingerprints with timing side-channels. A two-stage pipeline first builds architecture-specific templates from adversarial misclassifications across weight-variant models, then uses remote inference timing to prune candidates before final identification via template matching and majority voting. Experiments on 27 CNN/ViT architectures with CIFAR-10 show high effectiveness, achieving 88.8% accuracy with fewer than 20 queries and near-complete shortlist accuracy (98.5%). The findings expose practical vulnerabilities in outsourced ML backends and underscore the need for defenses such as rate limiting and model watermarking to protect intellectual property and service integrity.

Abstract

Machine learning, with its myriad applications, has become an integral component of numerous technological systems. A common practice in this domain is the use of transfer learning, where a pre-trained model's architecture, readily available to the public, is fine-tuned to suit specific tasks. As Machine Learning as a Service (MLaaS) platforms increasingly use pre-trained models in their backends, it's crucial to safeguard these architectures and understand their vulnerabilities. In this work, we present an approach based on the observation that the classification patterns of adversarial images can be used as a means to steal the models. Furthermore, the adversarial image classifications in conjunction with timing side channels can lead to a model stealing method. Our approach, designed for typical user-level access in remote MLaaS environments exploits varying misclassifications of adversarial images across different models to fingerprint several renowned Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures. We utilize the profiling of remote model inference times to reduce the necessary adversarial images, subsequently decreasing the number of queries required. We have presented our results over 27 pre-trained models of different CNN and ViT architectures using CIFAR-10 dataset and demonstrate a high accuracy of 88.8% while keeping the query budget under 20.

Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels

TL;DR

This work reveals a dual-path model-stealing threat against MLaaS by combining adversarial image fingerprints with timing side-channels. A two-stage pipeline first builds architecture-specific templates from adversarial misclassifications across weight-variant models, then uses remote inference timing to prune candidates before final identification via template matching and majority voting. Experiments on 27 CNN/ViT architectures with CIFAR-10 show high effectiveness, achieving 88.8% accuracy with fewer than 20 queries and near-complete shortlist accuracy (98.5%). The findings expose practical vulnerabilities in outsourced ML backends and underscore the need for defenses such as rate limiting and model watermarking to protect intellectual property and service integrity.

Abstract

Machine learning, with its myriad applications, has become an integral component of numerous technological systems. A common practice in this domain is the use of transfer learning, where a pre-trained model's architecture, readily available to the public, is fine-tuned to suit specific tasks. As Machine Learning as a Service (MLaaS) platforms increasingly use pre-trained models in their backends, it's crucial to safeguard these architectures and understand their vulnerabilities. In this work, we present an approach based on the observation that the classification patterns of adversarial images can be used as a means to steal the models. Furthermore, the adversarial image classifications in conjunction with timing side channels can lead to a model stealing method. Our approach, designed for typical user-level access in remote MLaaS environments exploits varying misclassifications of adversarial images across different models to fingerprint several renowned Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures. We utilize the profiling of remote model inference times to reduce the necessary adversarial images, subsequently decreasing the number of queries required. We have presented our results over 27 pre-trained models of different CNN and ViT architectures using CIFAR-10 dataset and demonstrate a high accuracy of 88.8% while keeping the query budget under 20.
Paper Structure (12 sections, 6 figures, 1 table, 2 algorithms)

This paper contains 12 sections, 6 figures, 1 table, 2 algorithms.

Figures (6)

  • Figure 1: Varying classification for 5 adversarial images generated using FGSM, PGD, and BIM attacks, belonging to 5 different classes of CIFAR-10 dataset for 27 pre-trained models
  • Figure 2: Classification of adversarial images generated using PGD with models of the same architectures but different weight parameters
  • Figure 3: Comparison of Class-wise Difference of Means (DoMs) of classification probabilities between (a) Alexnet and Resnet18, (b) Alexnet and VGG11, and (c) Resnet18 and VGG11 with intra-architecture DoMs
  • Figure 4: Adversarial Image Selection for Model profiling in Black-box setup
  • Figure 5: (a) Attack Methodology (b) Shortlisting correctness based on inference time for $5$ test CNN target models of each architecture with varying weight parameters trained using CIFAR10.
  • ...and 1 more figures