Table of Contents
Fetching ...

FDINet: Protecting against DNN Model Extraction via Feature Distortion Index

Hongwei Yao, Zheng Li, Haiqin Weng, Feng Xue, Zhan Qin, Kui Ren

TL;DR

FDINet, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models, proves to be highly effective in detecting model extraction and exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%.

Abstract

Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding adversaries from distributed extraction attacks. We conduct extensive experiments to evaluate FDINET against six state-of-the-art extraction attacks on four benchmark datasets and four popular model architectures. Empirical results demonstrate the following findings FDINET proves to be highly effective in detecting model extraction, achieving a 100% detection accuracy on DFME and DaST. FDINET is highly efficient, using just 50 queries to raise an extraction alarm with an average confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%. Additionally, it demonstrates the ability to detect two types of adaptive attacks.

FDINet: Protecting against DNN Model Extraction via Feature Distortion Index

TL;DR

FDINet, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models, proves to be highly effective in detecting model extraction and exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%.

Abstract

Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding adversaries from distributed extraction attacks. We conduct extensive experiments to evaluate FDINET against six state-of-the-art extraction attacks on four benchmark datasets and four popular model architectures. Empirical results demonstrate the following findings FDINET proves to be highly effective in detecting model extraction, achieving a 100% detection accuracy on DFME and DaST. FDINET is highly efficient, using just 50 queries to raise an extraction alarm with an average confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%. Additionally, it demonstrates the ability to detect two types of adaptive attacks.
Paper Structure (36 sections, 1 theorem, 7 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 36 sections, 1 theorem, 7 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Proposition 5.1

Given two inspected clients $u$ and $v$, and their $n \times bs$ FDI vectors $\mathcal{I}_{u}$ and $\mathcal{I}_{v}$, the null hypothesis can be expressed as: $\mathcal{H}_{0}: \mu_{u} = \mu_{v}$, while the alternative hypothesis is expressed as $\mathcal{H}_{a}: \mu_{u} \neq \mu_{v}$. Though calcul

Figures (7)

  • Figure 1: Overview the pipeline of FDINet. In the first step, we select $K$ anchor samples for each class $c$ (airplane in figure). In the next step, we measure feature distortion to obtain FDI vector for each inspected sample. Finally, the extracted FDI vector is used to create a binary extraction attack detector and a colluding adversaries detector.
  • Figure 2: ROC curve of model extraction attacks detection.
  • Figure 3: Results of average Extraction Status for benign and malicious clients (lower is better for benign clients). Extraction Status (ES) is a metric proposed by kesarwani2018model that uses information gain to quantify model privacy leakage from the victim model.
  • Figure 4: Illustration of the confusion matrix for average hypothesis tests' p-values over different clients. If the p-value is higher than 0.05, we accept $\mathcal{H}_{0}$, meaning clients $u$ and $v$ are colluding adversaries.
  • Figure 5: Performance of colluding adversaries detection for distributed attacks. We consider a 100 clients MLaaS platform. Among them, $2 \sim 20$ are colluding adversaries for each attack.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 5.1: Feature Distortion Index
  • Proposition 5.1: Two-sample Hypothesis Tests