MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

Mahendra Gurve; Sankar Behera; Satyadev Ahlawat; Yamuna Prasad

MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

Mahendra Gurve, Sankar Behera, Satyadev Ahlawat, Yamuna Prasad

TL;DR

misGUIDE, a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD, significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings.

Abstract

The rise of Machine Learning as a Service (MLaaS) has led to the widespread deployment of machine learning models trained on diverse datasets. These models are employed for predictive services through APIs, raising concerns about the security and confidentiality of the models due to emerging vulnerabilities in prediction APIs. Of particular concern are model cloning attacks, where individuals with limited data and no knowledge of the training dataset manage to replicate a victim model's functionality through black-box query access. This commonly entails generating adversarial queries to query the victim model, thereby creating a labeled dataset. This paper proposes "MisGUIDE", a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD. The first step employs a Vision Transformer-based framework to identify OOD queries, while the second step perturbs the response for such queries, introducing a probabilistic loss function to MisGUIDE the attackers. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed framework significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings.

MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

TL;DR

Abstract

Paper Structure (19 sections, 7 equations, 6 figures, 2 tables, 3 algorithms)

This paper contains 19 sections, 7 equations, 6 figures, 2 tables, 3 algorithms.

Introduction
Related Work and Preliminaries
Model extraction attacks
Defenses against model extraction attacks
Preliminaries
Adversary’s goal
Adversary’s knowledge and capabilities:
Data Free Model Extraction Attacks (DFME) Truong2021
Defender goal
Proposed Methodology
MisGUIDE Framework
Vision Transformer: an OOD detector
Misguiding probabilistic Threshold $p$
Experiments
Experiment Setup
...and 4 more sections

Figures (6)

Figure 1: MisGUIDE framework, highlighting its core components: the Victim Model (M), an OOD Detector, a Misguiding Function introducing controlled randomness, and a Switch Mechanism dynamically deciding accurate or intentionally incorrect predictions.
Figure 2: Distribution plot for CIFAR-10 and CIFAR-100
Figure 3: Distribution plot for CIFAR-10 and DFME queries
Figure 4: Assessment of Cloning Accuracy (CA) for varying Probabilistic Threshold $p$
Figure 5: Plot of embeddings in 2D using t-SNE for CIFAR-10, CIFAR-100 and GAN Generated Queries
...and 1 more figures

MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

TL;DR

Abstract

MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

Authors

TL;DR

Abstract

Table of Contents

Figures (6)