Table of Contents
Fetching ...

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu, Zhiquan Liu, Jialiang Shen, Chi Liu, Shuai Zhou, Tianqing Zhu, Wanlei Zhou

TL;DR

Osmosis Distillation attack is proposed, a novel model hijacking strategy that targets deep learning models using the fewest samples and enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility.

Abstract

Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.

Osmosis Distillation: Model Hijacking with the Fewest Samples

TL;DR

Osmosis Distillation attack is proposed, a novel model hijacking strategy that targets deep learning models using the fewest samples and enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility.

Abstract

Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.
Paper Structure (38 sections, 7 equations, 13 figures, 3 tables, 2 algorithms)

This paper contains 38 sections, 7 equations, 13 figures, 3 tables, 2 algorithms.

Figures (13)

  • Figure 1: The overview of our work. Different from backdoor attacks, OD attack incorporates a hijacking task into the original task by generating a distilled osmosis dataset that achieves the hijacking task with the fewest samples.
  • Figure 2: The workflow of OD attack. In stage (a), a Transporter is utilized to embed the hijacking task into the original task, producing osmosis samples, which are then distilled using image reconstruction, label reconstruction and training trajectory matching. In this stage (b), we solely use the distilled osmosis dataset for training the target model. The trained model executes either the original task or hijacking task based on varying queries.
  • Figure 3: Visualization of the output of OD attack. Figure (a) shows the samples of the Original dataset, figure (b) shows the samples of the hijacking dataset, and figure (c) shows the distilled osmosis samples.
  • Figure 4: The results between the clean model, the CAMH DBLP:conf/aaai/HeCPLZWLJ25, the Chameleon salem2022get and Ours (approach under IPC $= 50$). The first row presents results using the ResNet18 architecture, while the second row displays results obtained with VGG16. Each figure is labeled in the sequence of the original dataset followed by the hijacking dataset. These results show that the OD attack preserves high hijacking performance even with limited samples, while delivering considerable utility.
  • Figure 5: Evaluation of Different IPC of OD attack. The left column displays results using the ResNet18 architecture, while the right column displays results obtained with VGG16. The rows correspond to different dataset pairs.
  • ...and 8 more figures