Table of Contents
Fetching ...

RADEP: A Resilient Adaptive Defense Framework Against Model Extraction Attacks

Amit Chakraborty, Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty

TL;DR

RADEP tackles model-extraction threats in MLaaS by fusing progressive adversarial training, malicious query detection, adaptive output perturbation, and ownership verification into a cohesive, multi-layer defense. It introduces a hybrid detection score based on uncertainty and behavior, plus an adaptive response mechanism that degrades adversarial outputs while preserving legitimate interactions, and adds dual ownership verification via backdoors and watermarking. Empirical results show reduced extraction success and high detection accuracy across multiple datasets and attacks, with low per-query overhead and robust resilience to adaptive adversaries. The framework offers a practical, scalable solution for securing MLaaS deployments against functionality theft and unauthorized use.

Abstract

Machine Learning as a Service (MLaaS) enables users to leverage powerful machine learning models through cloud-based APIs, offering scalability and ease of deployment. However, these services are vulnerable to model extraction attacks, where adversaries repeatedly query the application programming interface (API) to reconstruct a functionally similar model, compromising intellectual property and security. Despite various defense strategies being proposed, many suffer from high computational costs, limited adaptability to evolving attack techniques, and a reduction in performance for legitimate users. In this paper, we introduce a Resilient Adaptive Defense Framework for Model Extraction Attack Protection (RADEP), a multifaceted defense framework designed to counteract model extraction attacks through a multi-layered security approach. RADEP employs progressive adversarial training to enhance model resilience against extraction attempts. Malicious query detection is achieved through a combination of uncertainty quantification and behavioral pattern analysis, effectively identifying adversarial queries. Furthermore, we develop an adaptive response mechanism that dynamically modifies query outputs based on their suspicion scores, reducing the utility of stolen models. Finally, ownership verification is enforced through embedded watermarking and backdoor triggers, enabling reliable identification of unauthorized model use. Experimental evaluations demonstrate that RADEP significantly reduces extraction success rates while maintaining high detection accuracy with minimal impact on legitimate queries. Extensive experiments show that RADEP effectively defends against model extraction attacks and remains resilient even against adaptive adversaries, making it a reliable security framework for MLaaS models.

RADEP: A Resilient Adaptive Defense Framework Against Model Extraction Attacks

TL;DR

RADEP tackles model-extraction threats in MLaaS by fusing progressive adversarial training, malicious query detection, adaptive output perturbation, and ownership verification into a cohesive, multi-layer defense. It introduces a hybrid detection score based on uncertainty and behavior, plus an adaptive response mechanism that degrades adversarial outputs while preserving legitimate interactions, and adds dual ownership verification via backdoors and watermarking. Empirical results show reduced extraction success and high detection accuracy across multiple datasets and attacks, with low per-query overhead and robust resilience to adaptive adversaries. The framework offers a practical, scalable solution for securing MLaaS deployments against functionality theft and unauthorized use.

Abstract

Machine Learning as a Service (MLaaS) enables users to leverage powerful machine learning models through cloud-based APIs, offering scalability and ease of deployment. However, these services are vulnerable to model extraction attacks, where adversaries repeatedly query the application programming interface (API) to reconstruct a functionally similar model, compromising intellectual property and security. Despite various defense strategies being proposed, many suffer from high computational costs, limited adaptability to evolving attack techniques, and a reduction in performance for legitimate users. In this paper, we introduce a Resilient Adaptive Defense Framework for Model Extraction Attack Protection (RADEP), a multifaceted defense framework designed to counteract model extraction attacks through a multi-layered security approach. RADEP employs progressive adversarial training to enhance model resilience against extraction attempts. Malicious query detection is achieved through a combination of uncertainty quantification and behavioral pattern analysis, effectively identifying adversarial queries. Furthermore, we develop an adaptive response mechanism that dynamically modifies query outputs based on their suspicion scores, reducing the utility of stolen models. Finally, ownership verification is enforced through embedded watermarking and backdoor triggers, enabling reliable identification of unauthorized model use. Experimental evaluations demonstrate that RADEP significantly reduces extraction success rates while maintaining high detection accuracy with minimal impact on legitimate queries. Extensive experiments show that RADEP effectively defends against model extraction attacks and remains resilient even against adaptive adversaries, making it a reliable security framework for MLaaS models.

Paper Structure

This paper contains 23 sections, 4 equations, 1 figure, 4 tables, 1 algorithm.

Figures (1)

  • Figure 1: Architecture of the proposed framework RADEP.