CAMH: Advancing Model Hijacking Attack in Machine Learning
Xing He, Jiahao Chen, Yuwen Pu, Qingming Li, Chunyi Zhou, Yingcai Wu, Jinbao Li, Shouling Ji
TL;DR
CAMH introduces a category-agnostic model hijacking framework that overcomes class-count mismatches and data-distribution gaps while preserving the original model’s performance. It combines a Synchronized Optimization Layer, noise-alignment perturbations, and a dual-loop training scheme to enable effective hijacking under outsourcing and model-marketplace scenarios. Across MNIST, SVHN, GTSRB, CIFAR10, and CIFARm, CAMH achieves high camouflage (CR ≈ 1) and strong hijacking efficacy (ER typically > 0.85), even with limited hijacking data and when the hijacking task has more classes than the original. The results underscore potential security risks in third-party training and pre-trained-model ecosystems and motivate continued development of robust defenses and detection strategies.
Abstract
In the burgeoning domain of machine learning, the reliance on third-party services for model training and the adoption of pre-trained models have surged. However, this reliance introduces vulnerabilities to model hijacking attacks, where adversaries manipulate models to perform unintended tasks, leading to significant security and ethical concerns, like turning an ordinary image classifier into a tool for detecting faces in pornographic content, all without the model owner's knowledge. This paper introduces Category-Agnostic Model Hijacking (CAMH), a novel model hijacking attack method capable of addressing the challenges of class number mismatch, data distribution divergence, and performance balance between the original and hijacking tasks. CAMH incorporates synchronized training layers, random noise optimization, and a dual-loop optimization approach to ensure minimal impact on the original task's performance while effectively executing the hijacking task. We evaluate CAMH across multiple benchmark datasets and network architectures, demonstrating its potent attack effectiveness while ensuring minimal degradation in the performance of the original task.
