Table of Contents
Fetching ...

Model Hijacking Attack in Federated Learning

Zheng Li, Siyuan Wu, Ruichuan Chen, Paarijaat Aditya, Istemi Ekin Akkus, Manohar Vanga, Min Zhang, Hao Li, Yang Zhang

TL;DR

This work introduces HijackFL, the first model hijacking attack targeting the global model in federated learning. It achieves hijacking by learning per-class pixel-level cloaks that transform hijacking samples in feature space to resemble original-class features, enabling correct original-class predictions without modifying local models. The attack preserves the global model’s utility on the original task while delivering high attack success on hijacking tasks, outperforming data- and model-poison baselines across four datasets and three model architectures. The paper also discusses potential defenses, including feature-based anomaly detection and adversarial-example defenses, and candidly addresses limitations such as the requirement that hijacking datasets have fewer classes than the original task. Overall, HijackFL highlights accountability and parasitic-computation risks in FL and motivates further defense research.

Abstract

Machine learning (ML), driven by prominent paradigms such as centralized and federated learning, has made significant progress in various critical applications ranging from autonomous driving to face recognition. However, its remarkable success has been accompanied by various attacks. Recently, the model hijacking attack has shown that ML models can be hijacked to execute tasks different from their original tasks, which increases both accountability and parasitic computational risks. Nevertheless, thus far, this attack has only focused on centralized learning. In this work, we broaden the scope of this attack to the federated learning domain, where multiple clients collaboratively train a global model without sharing their data. Specifically, we present HijackFL, the first-of-its-kind hijacking attack against the global model in federated learning. The adversary aims to force the global model to perform a different task (called hijacking task) from its original task without the server or benign client noticing. To accomplish this, unlike existing methods that use data poisoning to modify the target model's parameters, HijackFL searches for pixel-level perturbations based on their local model (without modifications) to align hijacking samples with the original ones in the feature space. When performing the hijacking task, the adversary applies these cloaks to the hijacking samples, compelling the global model to identify them as original samples and predict them accordingly. We conduct extensive experiments on four benchmark datasets and three popular models. Empirical results demonstrate that its attack performance outperforms baselines. We further investigate the factors that affect its performance and discuss possible defenses to mitigate its impact.

Model Hijacking Attack in Federated Learning

TL;DR

This work introduces HijackFL, the first model hijacking attack targeting the global model in federated learning. It achieves hijacking by learning per-class pixel-level cloaks that transform hijacking samples in feature space to resemble original-class features, enabling correct original-class predictions without modifying local models. The attack preserves the global model’s utility on the original task while delivering high attack success on hijacking tasks, outperforming data- and model-poison baselines across four datasets and three model architectures. The paper also discusses potential defenses, including feature-based anomaly detection and adversarial-example defenses, and candidly addresses limitations such as the requirement that hijacking datasets have fewer classes than the original task. Overall, HijackFL highlights accountability and parasitic-computation risks in FL and motivates further defense research.

Abstract

Machine learning (ML), driven by prominent paradigms such as centralized and federated learning, has made significant progress in various critical applications ranging from autonomous driving to face recognition. However, its remarkable success has been accompanied by various attacks. Recently, the model hijacking attack has shown that ML models can be hijacked to execute tasks different from their original tasks, which increases both accountability and parasitic computational risks. Nevertheless, thus far, this attack has only focused on centralized learning. In this work, we broaden the scope of this attack to the federated learning domain, where multiple clients collaboratively train a global model without sharing their data. Specifically, we present HijackFL, the first-of-its-kind hijacking attack against the global model in federated learning. The adversary aims to force the global model to perform a different task (called hijacking task) from its original task without the server or benign client noticing. To accomplish this, unlike existing methods that use data poisoning to modify the target model's parameters, HijackFL searches for pixel-level perturbations based on their local model (without modifications) to align hijacking samples with the original ones in the feature space. When performing the hijacking task, the adversary applies these cloaks to the hijacking samples, compelling the global model to identify them as original samples and predict them accordingly. We conduct extensive experiments on four benchmark datasets and three popular models. Empirical results demonstrate that its attack performance outperforms baselines. We further investigate the factors that affect its performance and discuss possible defenses to mitigate its impact.
Paper Structure (32 sections, 8 equations, 15 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 8 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: The overview of model hijacking attack in federated learning. The original task is CIFAR-10, while the hijacking task is MNIST.
  • Figure 2: The overview of cloaking the hijacking samples, which can then be classified by the global model. The background is the decision boundary of the original task. The colored points represent the features of hijacking samples.
  • Figure 3: The overview of the class-specific cloaking. The background is the decision boundary of the original task. The colored points represent the features of hijacking samples. The negative anchor feature $\Phi_{y^{\ast}}$ is fixed in the lower right region.
  • Figure 4: The greedy-based class mapping. The number of hijacking samples is the same for each hijacking class (i.e., each row). Each cell's value indicates the count of hijacking samples from that hijacking class mapped to the original class. NA indicates this original class is used for negative anchor features and not assigned to a particular hijacking class.
  • Figure 5: The convex combination of the hijacking sample and the cloak.
  • ...and 10 more figures