Table of Contents
Fetching ...

A framework for the extraction of Deep Neural Networks by leveraging public data

Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, Vinod Ganapathy

TL;DR

The paper tackles the privacy risk posed by MLaaS by proposing a practical model-extraction framework that leverages large public datasets (universal thieves) and active learning to steal deep neural networks under tight query budgets and without domain knowledge. It introduces DeepFool Active Learning (DFAL) and an ensemble sampling strategy to efficiently select informative samples from public data, achieving substantial agreement with secret models on image and text tasks. The results demonstrate up to roughly 4.8x improvement over uniform-noise baselines and show that universal thieves plus iterative, budget-conscious querying can effectively replicate complex DNNs. This work highlights a concrete, scalable threat to proprietary models and motivates development of defenses and more robust extraction-resistant architectures.

Abstract

Machine learning models trained on confidential datasets are increasingly being deployed for profit. Machine Learning as a Service (MLaaS) has made such models easily accessible to end-users. Prior work has developed model extraction attacks, in which an adversary extracts an approximation of MLaaS models by making black-box queries to it. However, none of these works is able to satisfy all the three essential criteria for practical model extraction: (1) the ability to work on deep learning models, (2) the non-requirement of domain knowledge and (3) the ability to work with a limited query budget. We design a model extraction framework that makes use of active learning and large public datasets to satisfy them. We demonstrate that it is possible to use this framework to steal deep classifiers trained on a variety of datasets from image and text domains. By querying a model via black-box access for its top prediction, our framework improves performance on an average over a uniform noise baseline by 4.70x for image tasks and 2.11x for text tasks respectively, while using only 30% (30,000 samples) of the public dataset at its disposal.

A framework for the extraction of Deep Neural Networks by leveraging public data

TL;DR

The paper tackles the privacy risk posed by MLaaS by proposing a practical model-extraction framework that leverages large public datasets (universal thieves) and active learning to steal deep neural networks under tight query budgets and without domain knowledge. It introduces DeepFool Active Learning (DFAL) and an ensemble sampling strategy to efficiently select informative samples from public data, achieving substantial agreement with secret models on image and text tasks. The results demonstrate up to roughly 4.8x improvement over uniform-noise baselines and show that universal thieves plus iterative, budget-conscious querying can effectively replicate complex DNNs. This work highlights a concrete, scalable threat to proprietary models and motivates development of defenses and more robust extraction-resistant architectures.

Abstract

Machine learning models trained on confidential datasets are increasingly being deployed for profit. Machine Learning as a Service (MLaaS) has made such models easily accessible to end-users. Prior work has developed model extraction attacks, in which an adversary extracts an approximation of MLaaS models by making black-box queries to it. However, none of these works is able to satisfy all the three essential criteria for practical model extraction: (1) the ability to work on deep learning models, (2) the non-requirement of domain knowledge and (3) the ability to work with a limited query budget. We design a model extraction framework that makes use of active learning and large public datasets to satisfy them. We demonstrate that it is possible to use this framework to steal deep classifiers trained on a variety of datasets from image and text domains. By querying a model via black-box access for its top prediction, our framework improves performance on an average over a uniform noise baseline by 4.70x for image tasks and 2.11x for text tasks respectively, while using only 30% (30,000 samples) of the public dataset at its disposal.

Paper Structure

This paper contains 38 sections, 6 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of model extraction
  • Figure 2: Adversarial example generation using DeepFool moosavi2016deepfool.
  • Figure 3: Our framework for model extraction (see Section \ref{['sec:technical-details']} for explanation of steps 1-5).
  • Figure 4: Network architecture for image classification tasks
  • Figure 5: The improvement in agreement for image classification experiments with a total budget of 20K over 10 iterations. Since random is not run iteratively, it is indicated as a line parallel to the X-axis.
  • ...and 2 more figures