SAME: Sample Reconstruction against Model Extraction Attacks

Yi Xie; Jie Zhang; Shiqian Zhao; Tianwei Zhang; Xiaofeng Chen

SAME: Sample Reconstruction against Model Extraction Attacks

Yi Xie, Jie Zhang, Shiqian Zhao, Tianwei Zhang, Xiaofeng Chen

TL;DR

SAME addresses the privacy and IP risks of MLaaS by detecting model extraction queries through sample reconstruction with a Masked Auto-encoder and an auxiliary model that repairs predictions on reconstructed data, yielding a robust anomaly score that combines reconstruction loss and deviation from the auxiliary model. It notably requires no auxiliary OOD datasets, user query history, or white-box access, and can function as a plug-in add-on to existing defenses. Across MNIST/EMNIST and CIFAR datasets, and using three MEA strategies, SAME outperforms state-of-the-art defenses in AUROC, AUPR, and FPR95 while preserving victim-model fidelity and reducing defense overhead. The work provides extensive ablation analyses to justify each component and demonstrates practical versatility by enabling integration with reject-prediction or proof-of-work schemes, offering a scalable defense for real-world MLaaS deployments.

Abstract

While deep learning models have shown significant performance across various domains, their deployment needs extensive resources and advanced computing infrastructure. As a solution, Machine Learning as a Service (MLaaS) has emerged, lowering the barriers for users to release or productize their deep learning models. However, previous studies have highlighted potential privacy and security concerns associated with MLaaS, and one primary threat is model extraction attacks. To address this, there are many defense solutions but they suffer from unrealistic assumptions and generalization issues, making them less practical for reliable protection. Driven by these limitations, we introduce a novel defense mechanism, SAME, based on the concept of sample reconstruction. This strategy imposes minimal prerequisites on the defender's capabilities, eliminating the need for auxiliary Out-of-Distribution (OOD) datasets, user query history, white-box model access, and additional intervention during model training. It is compatible with existing active defense methods. Our extensive experiments corroborate the superior efficacy of SAME over state-of-the-art solutions. Our code is available at https://github.com/xythink/SAME.

SAME: Sample Reconstruction against Model Extraction Attacks

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 7 figures, 4 tables)

This paper contains 28 sections, 6 equations, 7 figures, 4 tables.

Introduction
Preliminaries
Model Extraction Attack
Model Extraction Attack Detection
Threat Model
Motivation
Methodology
Sample Reconstruction via Masked Auto-encoder
Attack Repair via Auxiliary Model
Anomaly Score Calculation
Flexibility as an Add-on.
Experiment
Experimental Settings
Datasets and Model Architectures.
Attack Methods.
...and 13 more sections

Figures (7)

Figure 1: Distributions of anomaly scores for the classifier-based detection (left) and our sample reconstruction-based detection (right). The $x$-axis is in the logarithmic scale due to its long-tailed distribution. We utilize MNIST as normal query samples and employ KnockoffNets (with EMNIST-digits as the proxy set) to generate the malicious query samples. All samples undergo consistent preprocessing.
Figure 2: Distributions of anomaly scores for the reconstruction-based detection without (left) and with (right) Auxiliary Model (AM). The $x$-axis is in the logarithmic scale due to its long-tailed distribution. We utilize CIFAR-10 as normal query samples and employ JBDA (with 200 seed samples) to generate the malicious query samples. All samples undergo consistent preprocessing.
Figure 3: The workflow of the proposed SAME. Whenever a sample is received, a fully trained masked auto-encoder first performs sample reconstruction. The reconstructed sample is then fed into an auxiliary model that outputs an auxiliary prediction. The overall anomaly score is calculated based on two samples and two predictions. After that, an appropriate response strategy is selected according to the anomaly score. The victim model remains frozen throughout the defense.
Figure 4: Comparison of flexibility as an add-on.
Figure 5: Detection performance of SAME and its variants on the MNIST and CIFAR-10 datasets.
...and 2 more figures

SAME: Sample Reconstruction against Model Extraction Attacks

TL;DR

Abstract

SAME: Sample Reconstruction against Model Extraction Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)