Table of Contents
Fetching ...

Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

Jian-Ping Mei, Weibin Zhang, Jie Chen, Xuyun Zhang, Tiantian Zhu

TL;DR

This work tackles model stealing from black-box image-classification services by introducing Account-aware Distribution Distance (ADD), a non-parametric detector that leverages account-level query dependencies in embedding space. ADD models each class as a Multivariate Normal distribution and uses the squared Fréchet distance to quantify distribution discrepancy between reference and account-specific query statistics, yielding a Malicious Score that feeds a plug‑and‑play defense (D-ADD) with random prediction poisoning. The approach preserves utility for benign users under soft- and hard-label outputs and demonstrates strong defense against diverse cloning attacks, including adaptive strategies, while remaining training-free and lightweight. Empirical results across multiple datasets show superior detection and robust protection with minimal target-model utility loss, highlighting practical potential for deployment in commercial APIs and informing future work on integrated defense frameworks.

Abstract

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

TL;DR

This work tackles model stealing from black-box image-classification services by introducing Account-aware Distribution Distance (ADD), a non-parametric detector that leverages account-level query dependencies in embedding space. ADD models each class as a Multivariate Normal distribution and uses the squared Fréchet distance to quantify distribution discrepancy between reference and account-specific query statistics, yielding a Malicious Score that feeds a plug‑and‑play defense (D-ADD) with random prediction poisoning. The approach preserves utility for benign users under soft- and hard-label outputs and demonstrates strong defense against diverse cloning attacks, including adaptive strategies, while remaining training-free and lightweight. Empirical results across multiple datasets show superior detection and robust protection with minimal target-model utility loss, highlighting practical potential for deployment in commercial APIs and informing future work on integrated defense frameworks.

Abstract

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

Paper Structure

This paper contains 26 sections, 6 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overall pipe line of the proposed D-ADD Defense. The main idea of the new embedding space detector ADD is sketched on the right.
  • Figure 2: Illustration of the working principle of the proposed ADD detector.
  • Figure 3: Impact of sliding window size $N$ on distribution of Malicious Score (MS)s produced by ADD. Each red dot is calculated with a window of $N$ randomly selected surrogate samples as malicious queries, and each green dot is calculated with $N$ randomly selected testing images as benign queries.
  • Figure 4: ROC curves of ADD and two simplified variants for MNIST (malicious: FashionMNIST) and CIFAR-10 (malicious: CIFAR-100). We repeat three times to generate the benign queries by randomly selecting a given number of classes.
  • Figure A1: Comparison of the normality of distribution of distances produced with PRADA between benign queries (on the left) and malicious queries (on the right). The KnockoffNets stealing attack is simulated by using FashionMNIST and CIFAR-100 to query the classification model trained on MNIST and CIFAR-10, respectively. The Shapiro-Wilk score ranged in $[0, 1]$ measures the fitness of a set of values to a normal distribution.
  • ...and 2 more figures