SnatchML: Hijacking ML models without Training Access
Mahmoud Ghorbel, Halima Bouzidi, Ioan Marius Bilasco, Ihsen Alouani
TL;DR
SnatchML demonstrates a training-free, inference-time model hijacking threat that exploits over-parameterization to repurpose deployed models for hijacking tasks with no access to training data. By extracting benign knowledge from logits or feature maps and performing zero-shot, distance-based classification, an attacker can hijack models for tasks with more classes than the original or even unrelated tasks, in both black-box and white-box settings. The work systematically explores diverse scenarios from emotion recognition to medical ECG/Pneumonia tasks, shows practical attack efficacy on AWS SageMaker, and proposes defenses including model compression and meta-unlearning, with theoretical discussion on why over-parameterization enables such exploits. These findings highlight significant security and regulatory implications, suggesting that secure deployment must consider inference-time vulnerabilities and capacity-induced leakage beyond traditional training-time attacks.
Abstract
Model hijacking can cause significant accountability and security risks since the owner of a hijacked model can be framed for having their model offer illegal or unethical services. Prior works consider model hijacking as a training time attack, whereby an adversary requires full access to the ML model training. In this paper, we consider a stronger threat model for an inference-time hijacking attack, where the adversary has no access to the training phase of the victim model. Our intuition is that ML models, which are typically over-parameterized, might have the capacity to (unintentionally) learn more than the intended task they are trained for. We propose SnatchML, a new training-free model hijacking attack, that leverages the extra capacity learnt by the victim model to infer different tasks that can be semantically related or unrelated to the original one. Our results on models deployed on AWS Sagemaker showed that SnatchML can deliver high accuracy on hijacking tasks. Interestingly, while all previous approaches are limited by the number of classes in the benign task, SnatchML can hijack models for tasks that contain more classes than the original. We explore different methods to mitigate this risk; We propose meta-unlearning, which is designed to help the model unlearn a potentially malicious task while training for the original task. We also provide insights on over-parametrization as a possible inherent factor that facilitates model hijacking, and accordingly, we propose a compression-based countermeasure to counteract this attack. We believe this work offers a previously overlooked perspective on model hijacking attacks, presenting a stronger threat model and higher applicability in real-world contexts.
