Table of Contents
Fetching ...

Model-Guardian: Protecting against Data-Free Model Stealing Using Gradient Representations and Deceptive Predictions

Yunfei Yang, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao

TL;DR

This work tackles data-free model stealing in MLaaS by introducing Model-Guardian, a defense that combines a gradient-based Data-Free Model Stealing Detector (DFMS-Detector) with Deceptive Predictions (DPreds). DFMS-Detector learns to identify synthetic-query artifacts by transforming inputs into gradients and training an ensemble of binary detectors, enhancing generalization across diverse GANs and diffusion models. DPreds perturbs the probabilities returned to malicious queries to disrupt clone-model training while preserving benign accuracy, and the system can terminate further access if malicious queries exceed a threshold. Extensive experiments on CIFAR-10/100 and ImageNet across seven attacks and multiple generative models demonstrate state-of-the-art performance, strong generalization, and minimal impact on legitimate users, offering a practical defense for MLaaS deployments against data-free threats.

Abstract

Model stealing attack is increasingly threatening the confidentiality of machine learning models deployed in the cloud. Recent studies reveal that adversaries can exploit data synthesis techniques to steal machine learning models even in scenarios devoid of real data, leading to data-free model stealing attacks. Existing defenses against such attacks suffer from limitations, including poor effectiveness, insufficient generalization ability, and low comprehensiveness. In response, this paper introduces a novel defense framework named Model-Guardian. Comprising two components, Data-Free Model Stealing Detector (DFMS-Detector) and Deceptive Predictions (DPreds), Model-Guardian is designed to address the shortcomings of current defenses with the help of the artifact properties of synthetic samples and gradient representations of samples. Extensive experiments on seven prevalent data-free model stealing attacks showcase the effectiveness and superior generalization ability of Model-Guardian, outperforming eleven defense methods and establishing a new state-of-the-art performance. Notably, this work pioneers the utilization of various GANs and diffusion models for generating highly realistic query samples in attacks, with Model-Guardian demonstrating accurate detection capabilities.

Model-Guardian: Protecting against Data-Free Model Stealing Using Gradient Representations and Deceptive Predictions

TL;DR

This work tackles data-free model stealing in MLaaS by introducing Model-Guardian, a defense that combines a gradient-based Data-Free Model Stealing Detector (DFMS-Detector) with Deceptive Predictions (DPreds). DFMS-Detector learns to identify synthetic-query artifacts by transforming inputs into gradients and training an ensemble of binary detectors, enhancing generalization across diverse GANs and diffusion models. DPreds perturbs the probabilities returned to malicious queries to disrupt clone-model training while preserving benign accuracy, and the system can terminate further access if malicious queries exceed a threshold. Extensive experiments on CIFAR-10/100 and ImageNet across seven attacks and multiple generative models demonstrate state-of-the-art performance, strong generalization, and minimal impact on legitimate users, offering a practical defense for MLaaS deployments against data-free threats.

Abstract

Model stealing attack is increasingly threatening the confidentiality of machine learning models deployed in the cloud. Recent studies reveal that adversaries can exploit data synthesis techniques to steal machine learning models even in scenarios devoid of real data, leading to data-free model stealing attacks. Existing defenses against such attacks suffer from limitations, including poor effectiveness, insufficient generalization ability, and low comprehensiveness. In response, this paper introduces a novel defense framework named Model-Guardian. Comprising two components, Data-Free Model Stealing Detector (DFMS-Detector) and Deceptive Predictions (DPreds), Model-Guardian is designed to address the shortcomings of current defenses with the help of the artifact properties of synthetic samples and gradient representations of samples. Extensive experiments on seven prevalent data-free model stealing attacks showcase the effectiveness and superior generalization ability of Model-Guardian, outperforming eleven defense methods and establishing a new state-of-the-art performance. Notably, this work pioneers the utilization of various GANs and diffusion models for generating highly realistic query samples in attacks, with Model-Guardian demonstrating accurate detection capabilities.

Paper Structure

This paper contains 24 sections, 7 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Overview of our proposed Model-Guardian. During training, synthetic images from a randomly selected Data-Free Model Stealing (DFMS) attack, Generative Adversarial Network (GAN), and Diffusion Model (DM) are combined with real images to form three distinct training sets. A pre-trained transformation model converts these data into gradients, which are used to train three sub-detectors, later integrated into a unified detector. In the detection phase, query samples of user are converted into gradients and passed to the detector for evaluation. During the response phase, the model adapts its output based on the query record.
  • Figure 2: Visualization of gradients and Class Activate Map (CAM) extracted from detector on real and synthetic images.
  • Figure 3: Visualization of class probabilities before perturbation (the second and fifth columns) and after perturbation (the third and last columns) on synthetic query images from six different sources (DaST, StyleGAN, ADM located in the first column and DFME, GauGAN, DALL-E located in the third column).
  • Figure 4: Model stealing attack and its vulnerabilities. Attackers will query models deployed in the cloud through APIs using surrogate or synthetic data to obtain corresponding predictions. They can then use these annotated data to train a clone model with similar functionality to the original models and carry out further malicious actions.
  • Figure 5: Visualization of t-SNEs for both normal and defensive ResNet-34 on CIFAR-10. Each dot represents one data point.