Table of Contents
Fetching ...

SnatchML: Hijacking ML models without Training Access

Mahmoud Ghorbel, Halima Bouzidi, Ioan Marius Bilasco, Ihsen Alouani

TL;DR

SnatchML demonstrates a training-free, inference-time model hijacking threat that exploits over-parameterization to repurpose deployed models for hijacking tasks with no access to training data. By extracting benign knowledge from logits or feature maps and performing zero-shot, distance-based classification, an attacker can hijack models for tasks with more classes than the original or even unrelated tasks, in both black-box and white-box settings. The work systematically explores diverse scenarios from emotion recognition to medical ECG/Pneumonia tasks, shows practical attack efficacy on AWS SageMaker, and proposes defenses including model compression and meta-unlearning, with theoretical discussion on why over-parameterization enables such exploits. These findings highlight significant security and regulatory implications, suggesting that secure deployment must consider inference-time vulnerabilities and capacity-induced leakage beyond traditional training-time attacks.

Abstract

Model hijacking can cause significant accountability and security risks since the owner of a hijacked model can be framed for having their model offer illegal or unethical services. Prior works consider model hijacking as a training time attack, whereby an adversary requires full access to the ML model training. In this paper, we consider a stronger threat model for an inference-time hijacking attack, where the adversary has no access to the training phase of the victim model. Our intuition is that ML models, which are typically over-parameterized, might have the capacity to (unintentionally) learn more than the intended task they are trained for. We propose SnatchML, a new training-free model hijacking attack, that leverages the extra capacity learnt by the victim model to infer different tasks that can be semantically related or unrelated to the original one. Our results on models deployed on AWS Sagemaker showed that SnatchML can deliver high accuracy on hijacking tasks. Interestingly, while all previous approaches are limited by the number of classes in the benign task, SnatchML can hijack models for tasks that contain more classes than the original. We explore different methods to mitigate this risk; We propose meta-unlearning, which is designed to help the model unlearn a potentially malicious task while training for the original task. We also provide insights on over-parametrization as a possible inherent factor that facilitates model hijacking, and accordingly, we propose a compression-based countermeasure to counteract this attack. We believe this work offers a previously overlooked perspective on model hijacking attacks, presenting a stronger threat model and higher applicability in real-world contexts.

SnatchML: Hijacking ML models without Training Access

TL;DR

SnatchML demonstrates a training-free, inference-time model hijacking threat that exploits over-parameterization to repurpose deployed models for hijacking tasks with no access to training data. By extracting benign knowledge from logits or feature maps and performing zero-shot, distance-based classification, an attacker can hijack models for tasks with more classes than the original or even unrelated tasks, in both black-box and white-box settings. The work systematically explores diverse scenarios from emotion recognition to medical ECG/Pneumonia tasks, shows practical attack efficacy on AWS SageMaker, and proposes defenses including model compression and meta-unlearning, with theoretical discussion on why over-parameterization enables such exploits. These findings highlight significant security and regulatory implications, suggesting that secure deployment must consider inference-time vulnerabilities and capacity-induced leakage beyond traditional training-time attacks.

Abstract

Model hijacking can cause significant accountability and security risks since the owner of a hijacked model can be framed for having their model offer illegal or unethical services. Prior works consider model hijacking as a training time attack, whereby an adversary requires full access to the ML model training. In this paper, we consider a stronger threat model for an inference-time hijacking attack, where the adversary has no access to the training phase of the victim model. Our intuition is that ML models, which are typically over-parameterized, might have the capacity to (unintentionally) learn more than the intended task they are trained for. We propose SnatchML, a new training-free model hijacking attack, that leverages the extra capacity learnt by the victim model to infer different tasks that can be semantically related or unrelated to the original one. Our results on models deployed on AWS Sagemaker showed that SnatchML can deliver high accuracy on hijacking tasks. Interestingly, while all previous approaches are limited by the number of classes in the benign task, SnatchML can hijack models for tasks that contain more classes than the original. We explore different methods to mitigate this risk; We propose meta-unlearning, which is designed to help the model unlearn a potentially malicious task while training for the original task. We also provide insights on over-parametrization as a possible inherent factor that facilitates model hijacking, and accordingly, we propose a compression-based countermeasure to counteract this attack. We believe this work offers a previously overlooked perspective on model hijacking attacks, presenting a stronger threat model and higher applicability in real-world contexts.
Paper Structure (38 sections, 4 equations, 21 figures, 10 tables, 1 algorithm)

This paper contains 38 sections, 4 equations, 21 figures, 10 tables, 1 algorithm.

Figures (21)

  • Figure 1: An attacker can use the output of a model trained on task $\mathcal{T}$ to infer a different task ($\mathcal{T}'$). For example, a user can be identified by simply comparing the similarity between feature maps of a query and a reference identity class.
  • Figure 2: Examples of the hijacked ER model's output top-5 candidates for users re-identification from CK+ dataset.
  • Figure 3: SnatchML performance on hijacking an ER model for users re-identification. 'Hijacking LB' and 'Hijacking UB' refer to the lower and upper bounds, respectively.
  • Figure 4: ER model used to to identify users from the Olivetti dataset samaria1994parameterisation. The top-5 similar users from the hijacking reference database are displayed for each query.
  • Figure 5: SnatchML performance on hijacking an ER model for biometric identification of users from the Olivetti dataset.
  • ...and 16 more figures

Theorems & Definitions (1)

  • Definition 1